Seven challenges to getting the Common Core tests right

Credit: Steve Cohn

Morgan Polikoff

The rollout of the Common Core standards offers California – and most of the nation – an opportunity to address some of the issues that have plagued education reform in the past. Foremost among these issues is the generally poor quality of state assessments of student achievement and a resulting negative effect on instruction.

State tests in the No Child Left Behind (NCLB) era tended to be: a) highly procedural, ignoring the conceptual skills in the standards, b) heavily or exclusively multiple-choice, and c) predictable in their coverage of a narrow slice of content in the standards. These features undoubtedly contributed to the narrowing effects of the NCLB law, leading teachers to spend substantial time in test preparation and focus heavily on English and math at the expense of other subjects.

California’s new assessments will come from the Smarter Balanced Assessment Consortium, one of two federally funded consortia designed to measure student mastery of the Common Core. While there are promising signs about these new assessments, they also present several challenges. In a recent report for the Center for American Progress, I laid out seven of the most important challenges that must be addressed if the new assessments are to live up to their promise and support effective standards implementation.

The first challenge is making the case for and standing firm on higher definitions of proficiency. One of the goals of Common Core is to set more accurate definitions of proficiency so that, for instance, “proficient” students can enroll in college without remediation. Setting a higher target means that more students will be labeled as below proficient. As we have seen in New York, lower rates of student proficiency than parents and educators were used to seeing under the former state standards can sometimes produce political blowback. Making the case for the new, more rigorous targets, perhaps with public-service announcements, op-eds and targeted mailings to parents and educators, may help reduce backlash.

The second challenge is meeting the technological needs of new assessments. The consortium has guidelines in terms of the technology needed to take new computer-adaptive assessments. These technology upgrades will be somewhat costly at first, though perhaps not compared to total K-12 spending. Schools and districts should embrace this requirement by making thoughtful purchases that can be used for other instructional purposes besides assessment.

The third challenge is scoring new test items that move beyond multiple choice to ask students to compose answers to complex or real-world problems. The new item types are essential to improving the quality of the assessments, yet scoring them can be difficult. Getting reliable scores from human raters is expensive and time-consuming, and the technology has not yet advanced sufficiently to allow for computer scoring of nuanced writing elements.

The fourth challenge is ensuring the tests truly cover the full range of content in the standards. As mentioned above, this was not the case with prior tests. To meet this challenge, the consortia will need to construct quality items for both hard-to-assess skills and for more advanced levels of cognitive demand (i.e., moving beyond memorization and procedures to application and generalization). Again, constructed response items that ask students to analyze or solve complex problems will be essential here.

The fifth challenge is in minimizing the testing time burden. There is clearly a growing movement against the amount of testing in schools. The new assessments will take somewhat more time than the old ones, largely because they’ll measure more complex skills. Educators and policymakers should make the case to the public and to parents for the value of higher-quality tests that provide feedback on a wider range of student skills. Districts have a role to play on testing time as well – they should evaluate all their testing activities and reduce or eliminate those that are not essential.

The sixth challenge is validating the assessments for new uses for school and teacher accountability. This is less of an issue in California than other states, because California has resisted the federal push for stricter teacher evaluation. Given the many questions about new teacher evaluation systems being rolled out in other states, this appears to have been a prudent move. In the long term, teacher evaluation can clearly be improved, but there is little sense in rushing untested reforms during Common Core rollout. Nevertheless, all decisions made using test data need clear, appropriate validity arguments.

The seventh challenge is managing the rollout of the new tests alongside other new policies that are happening simultaneously. In California, the two most important K-12 policies happening now are Local Control Funding and the Common Core implementation. Given the potential blowback resulting from the new assessments, state policy leaders should err on the side of caution when using assessment results to make high-stakes decisions about students, teachers or schools in the early years of new tests. This will allow the new policies to be implemented and mature more carefully.

This is far from a comprehensive list of issues (for instance, I did not mention the challenge of accommodating students with disabilities and English learners on the new tests), but it should give food for thought to policymakers. More detail on these issues, as well as concrete suggestions for policy and practice, can be found in the full report.

Regardless of one’s stance on the Common Core, all can agree that, if we are going to have strong common standards, it is essential that we get implementation right. Focusing on assessment quality and these seven issues is a good place to start.

Morgan Polikoff is an assistant professor of education at the USC Rossier School of Education. He studies standards, assessment and accountability policies.

EdSource welcomes commentaries representing diverse points of view. The opinions expressed in this commentary represent those of the authors. If you would like to submit a commentary for EdSource Today, please contact us.

Filed under: Commentary, Common Core, Hot Topics, Technology, Testing and Accountability



Leave a Comment

Your email address will not be published. Required fields are marked *

Comment Policy

EdSource encourages a robust debate on education issues and welcomes comments from our readers.

  • To preserve a civil dialogue, writers should avoid personal, gratuitous attacks and invective.
  • Comments should be relevant to the subject of the article responded to.
  • EdSource retains the right not to publish inappropriate and offensive comments.
  • EdSource encourages commenters to use their real names. Commenters who do decide to use a pseudonym should use it consistently.
  • Please limit comments to 250 words to prevent comment clutter; if you intend to say more please link out to a place that contains your full comment.
  • Comments with more than one link automatically enter moderation. Comments from new commenters are automatically moderated.
  • Repeated violation of this comment policy will lead to a warning. Continued violations will lead to a ban.

58 Responses to “Seven challenges to getting the Common Core tests right”

EdSource does not track who "likes or dislikes" a comment. We only track the number of likes and dislikes.

  1. Paul Bonner on May 7, 2014 at 7:24 am05/7/2014 7:24 am

    • 000

    Other challenges…8. Not totally disrupting the instructional days throughout the month of May… 9. Developing manageable formative tests (that do not impede instructional time) to prepare for summative assessments… 10. Allowing teachers access to results to inform instruction. 11. Getting local districts to quit piling on poorly designed assessments for this false sense of holding schools accountable…

  2. Mariana on May 5, 2014 at 2:38 pm05/5/2014 2:38 pm

    • 000

    In my experience, many students who scored advanced on the STAR test have strong procedural skills, but lack an understanding of which procedures will be useful in a new situation. Once problems require choices of strategies and justification of the the decisions made, students need to deeply understand the mathematics and not just follow a recipe to get the answer. I am excited that we are actually expecting students to problem solve using the math they are learning.

  3. Don on May 4, 2014 at 8:24 am05/4/2014 8:24 am

    • 000

    Why are formative and interim assessments optional? If some districts or classrooms administer only the summative assessments while others avail themselves of all the resources in the SBAC toolkit, won’t some students have a testing advantage based solely upon local policy decisions, the CORE group excluded?

  4. Susan Meier on May 4, 2014 at 7:17 am05/4/2014 7:17 am

    • 000

    I believe an eighth consideration critical to the success of the CCSS is for the assessments to be effective instructional tools. This requires detailed item analysis available with open test results per item. The assessments must enable the creation of (or ideally come with) parallel, formative mini-assessments to be administered by classroom teachers periodicaly across the year to chart individual student progress toward the standard over time. Only these types of assessments influence instruction to result in the increase in rigor we need.


    • Manuel on May 4, 2014 at 8:29 pm05/4/2014 8:29 pm

      • 000

      Ms. Meier, your suggestion that mini-assessments be administered throughout the school year may lead to theses assessments to be used for determining the classroom mark, something that has up until now been the sole responsibility of the teacher.

      If this were to happen, then the Ed Code will have to be changed to allow this. And it raises the possibility that there will be a “standard” by which classroom marks will be defined. Is this desirable in the long run, especially if the assessments are designed to produce a score distribution ruled by the Bell Curve? Who will decide where the cutoff points are set? And what will be done with those who “fail?”

      • Don on May 5, 2014 at 8:49 am05/5/2014 8:49 am

        • 000

        Susan and Manuel, copied below is the SBAC summary of the formative and interim components that are optional. If for a moment we put aside other controversies and questions as alluded to by Mr. Polikoff and focus of the three components, summative being the last and arguably the most important, if school/districts are opting to implement only the last of the three, won’t their students be at a disadvantage when compared to those whose teachers have employed the full range of SBAC assessment and guidance to tweak instruction throughout the year leading up to the summative assessments?

        “Optional interim assessments administered at locally determined intervals. These assessments will provide educators with actionable information about student progress throughout the year. Like the summative assessment, the interim assessments will be computer adaptive and includes performance tasks.

        The interim assessments will:

        Help teachers, students, and parents understand whether students are on track, and identify strengths and limitations in relation to the Common Core State Standards;
        Be fully accessible for instruction and professional development (non-secure); and
        Support the development of state end-of-course tests.

        Formative assessment practices and strategies are the basis for a digital library of professional development materials, resources, and tools aligned to the Common Core State Standards and Smarter Balanced claims and assessment targets. Research-based instructional tools will be available on-demand to help teachers address learning challenges and differentiate instruction. The digital library will include professional development materials related to all components of the assessment system, such as scoring rubrics for performance tasks.

      • Don on May 5, 2014 at 10:25 am05/5/2014 10:25 am

        • 000

        Leaving other SBAC/ Common Core controversies aside, if only certain districts or schools avail themselves on a discretionary basis of the formative and interim resources of SBAC such as student/teacher feedback mechanisms and mini-assessments (see description below from SBAC website), won’t students who do not receive those benefits be at a disadvantage to those that do when it comes to the summative assessments, assuming benefit is derived from participation?

        Manuel, I’m not sure Ed Code would have to change. Teachers would still have discretion as to how they want to employ mini-assessments into their grading systems.

        From SBAC:

        *****Optional interim assessments administered at locally determined intervals. These assessments will provide educators with actionable information about student progress throughout the year. Like the summative assessment, the interim assessments will be computer adaptive and includes performance tasks.

        The interim assessments will:

        Help teachers, students, and parents understand whether students are on track, and identify strengths and limitations in relation to the Common Core State Standards;

        Be fully accessible for instruction and professional development (non-secure); and

        Support the development of state end-of-course tests.

        Formative assessment practices and strategies are the basis for a digital library of professional development materials, resources, and tools aligned to the Common Core State Standards and Smarter Balanced claims and assessment targets. Research-based instructional tools will be available on-demand to help teachers address learning challenges and differentiate instruction. The digital library will include professional development materials related to all components of the assessment system, such as scoring rubrics for performance tasks.*****

  5. Tom Sundstrom on May 3, 2014 at 12:44 pm05/3/2014 12:44 pm

    • 000

    Schools are not factories, BUT I just reread Deming’s 13 Key Principles and there’s much to be applied to education:

    Principle number 3 relates to this article, “Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.” …

    The CCSS implementation plan appears centered on testing. The assessments are providing a proxy for content specifications that the standards could not provide. Assuming that the domains of the assessments match the scope and rigor intended by the CCSS, this is a case where teaching to the test is implementation of the standards.

    To follow Deming’s principle and build quality into instruction before inspection, assessment providers need to convey their precise content domain with cognitive levels to educators. This will allow assessment-aligned instruction which will be equivalent to CCSS-aligned instruction assuming the assessments are aligned to the CCSS. Today, there are too many interpretations of standards-derived content and too little information about the assessment domains for any clarity of instructional objectives at the classroom level.

  6. Manuel on May 2, 2014 at 9:09 pm05/2/2014 9:09 pm

    • 000

    Putting aside all the issues about the testing mechanics, I’d like to focus on what exactly is “proficiency.” The author gives a hint:

    “proficient” students can enroll in college without remediation

    This is an interesting definition because it states that, if every high school graduate is proficient, then every high school graduate is college-eligible.

    In California, the lowest college tier is the CSU system, which was designed to serve the upper 33% of California high school graduates. This implies that only the upper 33% of graduates are eligible. But now we seem to be requiring that 100% of graduates be eligible, which does not match the 33% college-eligibility design.

    Thus, the goal that every student be proficient is not attainable when the college entrance requirement is designed to define proficiency only for the upper 33%.

    Either the college entrance requirement is changed to allow 100% of applicants to be eligible, or we admit this will never happen and be done with NCLB.

    Does this makes sense?


    • Morgan Polikoff on May 2, 2014 at 9:15 pm05/2/2014 9:15 pm

      • 000

      Actually, this refers to community college.

      • Manuel on May 4, 2014 at 6:58 pm05/4/2014 6:58 pm

        • 000

        Uh, no, that’s not the case.

        Page 73 of “A Master Plan for Higher Education in California, 1960-1975″ (available from this web page) recommends that:

        “In order to raise materially standards for admission to the
        lower division, the state colleges select first-time freshmen from
        the top one-third (33 1/3 per cent) and the University from the
        top one-eighth (12 1/2 percent) of all graduates of California
        public high schools…”

        The Plan also recommended that 100% of students be admitted to what then were called “junior colleges” with 2.4% of their graduates ideally transferring to UC and 2.0% to state colleges (aka CSU). Those eligibility thresholds have not changed, as far as I know.

        Given that, we are still faced with a problem: how can 100% be proficient and thus eligible for college if only 33.3% are to be admitted?

        • navigio on May 4, 2014 at 9:40 pm05/4/2014 9:40 pm

          • 000

          Maybe community college was what they meant when they said ‘college ready’. If that’s true then I guess ‘proficiency’ is defined by a hs diploma.

          • Morgan Polikoff on May 4, 2014 at 9:47 pm05/4/2014 9:47 pm

            • 000

            Proficiency would be defined by proficiency. Plenty of folks have a high school diploma but still have to take remedial courses. The idea is that proficiency signals that you do not have to take remedial courses in community college.

            • navigio on May 5, 2014 at 5:46 am05/5/2014 5:46 am

              • 000

              my point was that if community college is the goal but that is ‘designed’ to accept 100% of students, then a hs diploma is the measure of ‘college ready’.
              anyway, i think manuel’s point is where are all these new college-ready people supposed to go if we dont change how colleges are designed to support the student population. i dont expect college ready usually means ready for community college, otherwise we’re already there.

            • Morgan Polikoff on May 5, 2014 at 6:48 am05/5/2014 6:48 am

              • 000

              Unfortunately that wouldn’t be quite right. Yes, HS diploma means you can be admitted to community college, but a substantial proportion of enrolling community college students (who presumably have either a diploma or a GED) have to enroll in remedial courses. I think the number varies across colleges but national figures for community colleges are around 40%.

              But I totally agree with you that a focus on support is also needed.

        • Morgan Polikoff on May 4, 2014 at 9:45 pm05/4/2014 9:45 pm

          • 000

          If “proficient” meant “ready for enrollment without remediation in a community college,” I don’t see how 100% proficiency would conflict with only the top third being admitted to the CSUs. Since California did away with its accountability system, it’s not clear to me that 100% proficiency is the goal anymore, anyway.

          • Manuel on May 6, 2014 at 7:44 am05/6/2014 7:44 am

            • 000

            You don’t see a conflict, Morgan?

            Putting that aside for the moment, anyone that graduated from high school can attend a community college (formerly known as “junior college”) under the current Master Plan. That’s a fact.

            Will any student admitted be permitted to take the college-transferable courses? Not until they pass the “qualifying” exams that are the gate keepers. Yes, even community colleges have remediation classes. Shocking, isn’t it?

            But we are not talking about community college. We are talking about college, you know, a Bachelor degree-granting institution. And the lowest in California is the CSU system (sure, we could include the private for-profit colleges, but let’s not). If a student has to take remediation, then s/he is not college-ready, period. It does not matter if the tests administered by California says s/he is.

            And given the admission standards to college, we can claim that 100% are qualified to enter CSU without remediation but if only 33% are accepted even without remediation, then they are not qualified. We can dress the monkey in the best finery, but it is still a monkey.

            • Morgan Polikoff on May 6, 2014 at 7:54 am05/6/2014 7:54 am

              • 000

              A few points:

              *We* are not talking about four year colleges (me, or the policymakers who crafted this policy). *You* are talking about four year colleges. The goal is that all students would be able to enroll in college without remediation. Community college is college.

              Second, the expectation, if not now then over time (and if not in California, then in other places), is that proficiency on the high school tests will automatically exempt one from remediation. Which is how it should be – the extra entrance exam is an absolutely needless gatekeeper.

            • Manuel on May 6, 2014 at 9:37 am05/6/2014 9:37 am

              • 000

              Thank you, Morgan, for clarifying that point.

              However, if you and the alleged policymakers who crafted this policy are talking about community college as “college,” then the public has been given a pig in a poke.

              Community college has always been open to anybody with a pulse, and that is why no more than 2.4% of students were expected to transfer to the “upper division.” Nevertheless, it was/is well known in California that the community college transfer route could be as rigorous as that found in CSU and UC, hence the gatekeeper tests. If a student had goofed during high school and blown her/his GPA, this was the way to do it. It was also the route taken by those who could not afford to pay the higher costs of CSU/UC. But this did not take away the fact that students still had to perform at CSU/UC lower-division levels and would be required to “remediate” if they had deficiencies.

              Under the new testing system, students now have to jump the “proficiency” hurdle and you are telling us that this is so they don’t need remediation? All that this does is place the goal posts elsewhere without removing the fact that if all students do not need remediation then they are eligible for a college system that will only take 33% of them. This is a cruel joke, in my opinion.

              Incidentally, I’ve never ever heard that doing away with remediation is the aim of NCLB’s 100% proficient demand that, allegedly, Common Core is now addressing. I find this revelation to be extraordinary and I am glad that EdSource provides a place where this can be learned by the layman like me from the experts in the field like you. Perhaps you can point us to a primary source on that. TIA.

    • el on May 5, 2014 at 12:15 pm05/5/2014 12:15 pm

      • 000

      Great point, Manuel.

      • Manuel on May 6, 2014 at 7:48 am05/6/2014 7:48 am

        • 000

        Thank you, el.

        This issue has been nagging me for too many years. How can it be possible that UC admits have to have remediation? Who is making up the rules?

        This is not a recent problem. It was there when I was an undergraduate back in the golden days when the SAT was all that mattered. But now that everyone is demanding that every high school graduate be “ready for college or career” (whatever that means), well, there’s hay to be made because few are willing to admit that there are underlying problems.

        And to deconstruct those problems would take way too much time-space and, frankly, we are not getting paid for that. But it is great fun to make an effort, no?

        • Paul Bonner on May 7, 2014 at 7:40 am05/7/2014 7:40 am

          • 000

          I’m sorry, but this sounds like the Sanhedrin. At this point in time, approximately 30% of adult Americans have a college education. We don’t have enough seats, nor the desire on the part of universities, to serve even 50% of high school graduates. High schools were started at the turn of the 20th century to track college students and get others to jobs through dropping out or vocational programs. We aren’t doing much different today. My 14 year old son said, and I paraphrase, school only serves certain students. If a student is not AP, IB or Honors track, that student does not get the curriculum needed to succeed in college. Of course it would be nice if proficiency meant everyone is prepared for a higher education, but you have to get all of the teachers and, for that matter, every community member on board. We need to change our focus from a systemic to a cellular one. If testing is going to continue as a vehicle for accountability and standards we will never change the focus to each child, therefore, preparing each student for adulthood in a democratic republic. Testing should be used to inform instruction and teachers should be prepared and trusted to make this happen.

  7. el on May 2, 2014 at 4:36 pm05/2/2014 4:36 pm

    • 000

    As far as Item 6, I would encourage people to read the report from the American Statistical Association that came out April 8,

    “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”

    “This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in schools.”

  8. el on May 2, 2014 at 4:25 pm05/2/2014 4:25 pm

    • 000

    My daughter took the algebra math field test this week and though she is highly proficient (“Advanced” per STAR), she said she randomly guessed most of the questions because they were on material they hadn’t covered or otherwise made no sense to her.


    • Morgan Polikoff on May 2, 2014 at 4:27 pm05/2/2014 4:27 pm

      • 000

      It’s a good thing, then, that these field tests results aren’t counting. Ideally, that’s the purpose of the field tests – to iron out the kinks before the tests matter (I don’t believe they’ll really matter next year either, so there’s a lot of opportunity here).

      • el on May 2, 2014 at 4:41 pm05/2/2014 4:41 pm

        • 000

        I get that, but as an engineer, it’s hard for me to understand how you get from a single unsatisfactory field test to a final test instrument. I would want lots of small focus groups and lots of iterations and I would want the first “for real” arrangement to be something less than a statewide (or national) exercise.

        • Morgan Polikoff on May 2, 2014 at 4:43 pm05/2/2014 4:43 pm

          • 000

          Well I do think there’s some of that behind the scenes. But I actually agree with you. A statewide field test isn’t un-useful, but it should be only part of the process.

        • navigio on May 2, 2014 at 5:54 pm05/2/2014 5:54 pm

          • 000

          You don’t. One of my children called the math test ‘only kind of about math’

        • Don on May 2, 2014 at 6:54 pm05/2/2014 6:54 pm

          • 000

          El, as an engineer you know how to get to a final test instrument – trial and error – multiple tests with numerous adjustments along the way and over a length of time that is not predetermined. But there so much pressure to get this up and running, caution has been thrown to the wind. Since the results won’t be published the public will have a difficult time finding out how the students fared and/or how well the test measured content knowledge and critical thinking. It may well be that we will never know.

    • Doug McRae on May 2, 2014 at 5:26 pm05/2/2014 5:26 pm

      • 000

      EL: One of the fundamentals for good statewide assessment programs is that instruction must be implemented before assessments are implemented. It is crystal clear than common core instruction has not been implemented yet in CA — at best statewide CA is somewhere between Awareness and Transition on a 3-year 3-step Awareness / Transition / Implementation timeline for implementing common core instruction. For math, we have just approved common core curriculum frameworks and approved instructional materials at the state level within the last 6 months, and it will take local districts up to 2 years to adopt and provide professional development on the specific materials they want to use; thus, it will be spring 2016 at best before CA will have valid reliable comparable fair scores on common core Math tests. For E/LA, common core curriculum frameworks are scheduled for state approval this coming July, and instructional materials scheduled for November 2015. It will be spring 2017 before CA will have valid reliable comparable fair score on E/LA common core statewide tests. Yet we are going forward with common core assessments spring 2015 . . . . that makes no sense whatsoever in terms of generating meaningful or useful scores for the next two years. Your daughter (and you, EL) are simply the victims of having to experience tests that precede implementation of the instruction they are designed to measure.

      • el on May 2, 2014 at 6:16 pm05/2/2014 6:16 pm

        • 000

        I totally get that, Doug. Her teacher is actually using new curriculum that is supposed to be common core but of course that’s a moving target. Still, she has a pretty well rounded background and I am pretty surprised that she found it so far outside of her skillset that she was down to what she described as random selection of answers, not even guessing, let alone educated guessing.

      • el on May 2, 2014 at 6:24 pm05/2/2014 6:24 pm

        • 000

        I guess my question, Doug, which you might actually be able to answer, is how does presenting a test where the questions are so out of whack from what the kids can answer actually provide any useful feedback in terms of test design? I can see the value of the dress rehearsal in terms of seeing if the plumbing and infrastructure works, but I can’t see that bright kids randomly selecting answers is going to do anything in terms of question or test design. Would have been better to just input the old STAR questions and then we could pretend to get information about how the computer itself affects how the kids score.

        • Doug McRae on May 2, 2014 at 7:45 pm05/2/2014 7:45 pm

          • 000

          EL: If the entire bunch of SBAC questions are so far out of whack from what kids can answer, then the “field test” will not generate useful information for test development purposes. But, there are many many sets of questions grouped into roughly 50 item sets that each student sees in an item tryout (that’s what this spring’s SBAC “test” really is), probably 30-50 sets, and some will have more difficult questions while others have less difficult questions. So, it’s hard to draw a conclusion about the entire bunch of questions in terms of being way out of whack from an experience with only one set. SBAC is trying out maybe 1500 questions [21,000 / 14 grades-content areas] and expecting to get maybe 500 to 1000 good Q’s for their computer-adaptive item pools for each grade level and content area. They need those 500-1000 Q’s to measure the full range needed to measure all kids. If all Q’s are too hard or all Q’s are too easy, they won’t have a satisfactory item pool for that grade and content area. The question of how the computer affects how kids score is a separate question — there should be robust studies involving final computer-adaptive scores and counterpart paper-pencil scores to determine comparability between scores collected from those two modes to justify any aggregation of scores across the two modes of test administration. From what I understand, SBAC is attempting to judge comparability not at the full test score level but rather at the individual item level, and I’m not at all sure such an approach will provide solid stable comparability adjustments that will be needed for high stakes aggregations and disaggregations.

          • navigio on May 2, 2014 at 8:44 pm05/2/2014 8:44 pm

            • 000

            I’m curious, how do they define “good” questions? Is it based on the variance in the responses? Is that implied they know the demographics of the students who were taking the test? While I could see that being a valid approach, it’s also claimed that the big differences are not in the standards themselves but in the frameworks and the strategies for imparting them. isn’t that still missing the link to common core that el refers to? One other important point, common core is like 80% overlap with old standards according to some people. So what it seems to be is not so much of a disconnect with the standards, as much as a disconnect with the framework and the instructional methodology. It should also be noted that there are districts who been teaching “common core” for years now. There are other districts who are only starting to get their feet wet just this year.

            • Doug McRae on May 3, 2014 at 9:00 am05/3/2014 9:00 am

              • 000

              Navigio: To find “good” questions, test developers start with content reviews by multiple experts to make sure a question addresses the target content standard. Once past this screen, questions are included in item tryout studies (like the current SB field test) to collect empirical data from a representative sample of kids insuring that all significant subgroups (ethnic, gender, ses, el, etc) are included in sufficient numbers. The data from these item tryout studies are analyzed using perhaps a half dozen to a dozen screens to eliminate items that don’t pass empirical muster to contribute in a valid, reliable, fair, comparable way to a total test score.

              Then the fun part starts with the assembly of qualified items into actual test forms. Test construction is like paving a driveway with paving stones of various shapes, widths, lengths, thicknesses, colors, etc., yet the result has to be a rectangle with given dimensions and load bearing capacity [these are the test validity, reliability,fairness, comparability criteria that are cited in professional standards for good large scale tests]. This task is like a giant jigsaw puzzle, with lots of let’s try this one in this position and see if it fits. The folks who do it have great expertise solving big jigsaw puzzles. For computer-adaptive tests, each kid eventually gets his/her own unique driveway, built on-line to meet the required rectangle dimensions and load-bearing capacities, but using differing collections of paving stones.

              On your second point on claims that common core standards are not that different from CA’s old standards, in terms of content I think that is pretty accurate, the what we want learned hasn’t changed that much, but the how it should be learned is quite different. Test developers try to be above the “how learned” issue, and rather try to measure what is learned regardless of how it was learned. I’ll address your 3rd point on variation from school to school implementing the common core below.

            • Doug McRae on May 3, 2014 at 9:24 am05/3/2014 9:24 am

              • 000

              Navigio: Yes, there is great variation from LEA to LEA and school to school in CA on implementing the common core. One description I find useful is categorizing implementation into a 3-part 3-year rubric that schools / districts are in an Awareness or Transition or Implementation phase, with Awareness being activities becoming familiar with common core standards themselves, Transition being professional development on the more detailed curriculum frameworks and acquiring instructional materials focused on common core standards and piloting new common core lesson plans and techniques, and finally Implementation being putting it all together with all teachers in a school or district involved in executing common core instruction. There certainly are a number of schools in CA in the Implementation phase this year, more now in Transition that can forecast Implementation next year. Statewide supports like curriculum frameworks and recommended instructional materials are lagging, the Math curriculum frameworks approved Nov 2013 and recommended instructional mateials approved Jan 2014, while ELA/ELD curriculum frameworks are up for approval July 2014 and ELA/ELD instructional materials will not be approved until Nov 2015, and these lags are hampering schools and LEAs from moving into the Transition phase more quickly. My view is that the majority of schools in CA are just now moving from the Awareness to the Transition phase (Math running ahead of ELA/ELD in part due to the timing of statewide supports), and the best case scenario is that the majority of schools in CA will not be in the Implementation phase until the 2016-17 school year at best. As an assessment guy, I advise folks that statewide instruction must come before statewide assessments, or as Dave Gordon (Sacramento Co Supt) has been quoted “it isn’t fair to test the kids on skills they haven’t been taught.”

            • el on May 5, 2014 at 12:29 pm05/5/2014 12:29 pm

              • 000

              Just my feedback from what I’ve seen of various sample test questions over the years – often in an attempt to be cool and relevant, questions are created that are about Stuff – farming or horses or baseball or whatever. That’s well and good, but rarely are the people who make up these questions as expert as the most expert California child, and often the experts make common mistakes of people who know a topic superficially – like asking for the height of a horse at the eartips, or talking about the slope of a fence as being changes in north/south latitude rather than in elevation.

              If you have the ear of anyone who can implement this, I’d encourage questions like that to get informal consultation with a true subject matter expert (rather than a test making expert) – “Does this question make sense to you, or did we screw something up?”

              Much of successful test taking is having the ability to look past the absurdity of the presented scenario and take it at face value even though it contradicts what you know about the world. (Like in that PISA test item where there were two separate controls for temperature: what moron would design a real world system that worked that way? :-) )

          • el on May 5, 2014 at 12:22 pm05/5/2014 12:22 pm

            • 000

            For 100% proficiency, shouldn’t 100% of the kids find the questions easy?

            I realize I’m being pesky with theory here, but it’s our stated goal that 100% of kids should be proficient, as measured by these exams.

            But, we design the exams so that we see a Normal (in the mathematical sense) distribution of results. IE, if today 100% of kids were proficient, as is the goal, we still would alter the test to get the results we expect, which is a proficiency rate closer to half.

            I find your answers here illuminating and helpful. I am asking hard questions because I want to understand.

            • Morgan Polikoff on May 5, 2014 at 12:25 pm05/5/2014 12:25 pm

              • 000

              Not being pesky at all. I think the confusion is between norm- and criterion-referenced testing. In criterion-referenced testing, there is some criterion (proficiency), and if all of the students meet that criterion, fine. There may well still be a bell curve, but the goal is for the whole curve to be above the criterion (not saying it’s realistic, just saying that the normal curve and “proficiency” are not in conflict).

            • Manuel on May 6, 2014 at 8:00 am05/6/2014 8:00 am

              • 000

              Uh, Morgan, not to be pesky either, but the CSTs were supposed to be criterion-referenced tests, yet a description of their “construction” was that of a norm-referenced test.

              In addition, description of the STS tests made to the State Board of Ed clearly defined “expected” rates of “achievement” and suggested the placement of the cutoff points to ensure that more than 50% of test takers would be non-proficient.

              So, given this, how can we expect 100% proficient if the test has been designed to ensure that will never happen?

              BTW, if the entire Bell Curve is to be above the proficient criterion, we might as well do away with testing and save ourselves a lot of money, time, and effort. Given that, I’d say the the normalized Gaussian and “100% proficiency” are mutually exclusive.

            • Doug McRae on May 6, 2014 at 10:12 am05/6/2014 10:12 am

              • 000

              Morgan [cc Manuel] — As a test designer / psychometrician with more than 40 years experience, I absolutely agree that normal curve and proficiency are not in conflict. FYI, Manuel’s view posted this morning is a long running disagreement in this space whether distribution of scores defines what a test should be labeled; despite many attempts, I and others have not convinced Manual to dismount this particular horse. For a techy nuance, however, I would not call STAR CSTs criterion-referenced tests. Rather, I call CA’s CSTs and other statewide tests developed in the 00’s standards-based tests because they have properties inconsistent with purely diagnostic criterion-referenced tests developed in the 70s thru the 90’s. I will acknowledge, however, this this techy distinction is usually reserved for test maker and psychometrician watering holes and not widely addressed in general audience blogs of the 2010’s.

            • navigio on May 6, 2014 at 10:52 am05/6/2014 10:52 am

              • 000

              Doug, I’m with Manuel on this one. In many fields, including education, the criteria used to design ‘criteria-referenced’ tests is not arbitrary. They make assumptions about the ability of the testtakers and how they compare to society as a whole, or groups within it, or even to a rock. Dare I call that a “norm-based” criterion?

              Furthermore, tests are adapted to evolving conditions. The questions are changed, the assumptions are changed, even how we scale results changes depending on how people respond to the questions. This is an important point because in some fields what is being measured is constant, however, when it comes to humans and their culture, everything evolves and adapts to the test.

              That means norm-based forces are implicitly (and sometimes even explicitly) built into criterion-referenced and even standards-referenced tests. I don’t believe that test designers don’t notice this. In fact, I expect much of what they do attempts to counteract it.

              Although I appreciate you recognizing that there’s a third type of test referencing, in reality most people who refer to criterion-referenced tests refer to the type you referred to, and especially in the education sphere that likely makes the distinction academic for the reasons listed above.

              In addition, I think this point about how proficiency maps to realistic futures is extremely important. It seems like a fallacy to assume expectations would not adapt to the phenomenon of 100% proficiency. It’s true that working toward a narrowed Gaussian should be our goal, but I think that is clearly distinct from 100% of students being above average.

            • Don on May 6, 2014 at 1:16 pm05/6/2014 1:16 pm

              • 000

              Mr. McRae, not a testing cognoscenti myself I’m confused by your CRT/SRT distinction. Since we can’t accurately determine whether delivered classroom instruction is standards-based or criterion-based and there’s great freedom for most teachers who are largely unsupervised, isn’t the distinction between CRT and SRT academic?

            • Manuel on May 14, 2014 at 9:43 am05/14/2014 9:43 am

              • 000

              Doug (with cc to Morgan) – Sorry for being late to consider your comments. Other things got in the way plus the way the server has been giving access has been rather erratic (at one time there was a second page with comment that could not be accessed from this main page).

              Anyway, a normal curve (aka normalized Gaussian) contains, by definition, 100% of the population. If there is a demand that 100% of the population be “proficient,” then the proficient “cut-off” point must be defined at -infinity if one is to be mathematically correct.

              To a layman, that means that this is, indeed, Lake Wobegone.

              Yes, as long as this horse is kept alive by the testing industry, I’ll be forever mounted on it because this is a “techy” distinction that makes it dishonest to not allow the scaled score distribution to deviate from an a priori definition.

              OTOH, I am gratified that you do agree that CSTs, given their properties, are not criterion-referenced tests as advertised by CDE. That, sir, was my point from the beginning of this long running disagreement. Why this fact should be confined to the watering holes reserved for the Grand Poobahs of testing is, in my opinion, highly suspect. Shouldn’t the hoi polloi be told the truth? (“For the Earth is hollow, and I have touched the sky!”)

              And in other news, “Value-added scores don’t seem to be measuring the quality and content of the work that students are doing in the classroom.”

              OMG! Who woulda thunk?

            • navigio on May 14, 2014 at 10:27 pm05/14/2014 10:27 pm

              • 000

              wow, what an interesting editorial manuel. bunches to say about that, but one thing jumped out at me that doug may be able to provide feedback on:

              Another important issue is whether better tests would do a better job of measuring teacher quality. The new tests based on the Common Core standards were designed to do that;

              is that true? last i checked we didnt even have tests yet. even then, i dont remember that being a design criterion. i do remember the goal was to more deeply measure students’ understanding. i also remember previous comments that placed that type of measurement in conflict with measuring teacher effectiveness.

  9. Don on May 2, 2014 at 4:13 pm05/2/2014 4:13 pm

    • 000

    Challenge 1. Given the recent news about grades as a better predictor of college success, why would Mr. Polikoff consider the number one challenge of CCSS to be even higher testing standards, especially when half of California students are currently below proficient? California already had some of the highest standards of any state.

    2. That we are talking about how to meet the technological needs informs us of our lack of readiness. This is code for spending huge amounts on technology quickly which also means thoughtlessly. Ka-ching!

    3. Regarding scoring, you don’t build half a bridge and only then wonder how to solve the problem of how to build the other half.

    4. Does Mr. Polikoff understand that we are administering the field tests as we speak? It is a little late to be saying that we need to cover more content when we will have no further opportunity to test it.

    5. The idea that we should eliminate other tests that schools administer, some of which are quite well thought out, in order to reduce the testing burden is a way of making sure that CCSS will be universal and uncontested. Mr. Polikoff admits that CCSS increases the burden. He’s more or less making himself an advocate for universal and exclusive CCSS.

    6. Good point

    7. “err on the side of caution when using assessment results to make high-stakes decisions about students, teachers or schools…”

    Wow! Mr. Polikoff just made cases for significant problems in rolling out CCSS and now he has the temerity to suggest that caution should be used? If only caution was used when considering how to implement CCSS. I give him credit for brass ones.

    Final thought – This is what happens when a national government controlled by large business interests are allowed to implement technically unlawful national standards under the guise of state standards. The faster they get in in place, no matter how bad it is, the more difficult it will be to get rid of.

  10. Michael Butler on May 2, 2014 at 12:01 pm05/2/2014 12:01 pm

    • 000

    Most of the challenges listed pertain to Smarter Balanced Tests and not Common Core. I think we should all be more careful about separating the two. Even if Smarter Balanced tests stumble, the shift toward more writing across the curriculum, higher level thinking, etc. are good pedagogy that we should be celebrating. I am concerned that the linkage of reforms in standards-based curriculum and instruction will be tarred based on the summative assessment which is being field tested now. Of course they are related, but they are not synonymous.


    • Morgan Polikoff on May 2, 2014 at 4:15 pm05/2/2014 4:15 pm

      • 000

      Yes, I didn’t choose the title of the post. I would have chosen the title “Seven challenges to getting the Common Core tests right”. Sorry for that confusion.

      • Louis Freedberg on May 3, 2014 at 10:27 am05/3/2014 10:27 am

        • 000

        Good point. We’ll tweak the headline to make that point.

  11. Trish on May 2, 2014 at 10:47 am05/2/2014 10:47 am

    • 000

    Agreed Slammy! And, what about student growth?

  12. Dr. Candy Beal on May 2, 2014 at 9:43 am05/2/2014 9:43 am

    • 000

    We educators in NC wonder what next our Republican legislature will do to set us back decades in education. They have already voted no to across the board teacher salary increases and continued the freeze on teachers’ salaries that has been in place for 5 years (at the same time passed a tax break for the wealthy, and now, with reduced revenue cannot give raises), increased class size, taken away additional pay for Masters degrees, eliminated most of the state’s teacher assistants, gone after tenure and offered the top 25% of the teachers in a district $500 to give up their tenure immediately, increased the number of charter schools (many funded by Republicans in the private school business) and finally, the most recent scheme pondered is to let kids go to any school in the state regardless of their home county.

    I could point out the injustice of the situation for one of my university graduates who has been teaching for 6 years. She came in on the first salary step just as the salaries were frozen. She is now a 6th year teacher at a first year salary. When salaries are defrosted she will go to the second step on the salary scale. Who continues to teach under these circumstances?

    And finally, the legislature is now going after the Common Core. A study of global education shows that top scores on the PISA come from countries whose education systems’ curricula support deep reading, abstract thinking and problem based learning. Finnish Lessons and The Smartest Kids in the World are just 2 books that spell out the need for exactly what Common Core will test, a level of abstract thinking that is needed to compete on the global scene, not to mention provide kids with personal and intellectual growth and ensure that they will be life long learners. In NC if you can’t pass the test associate with Common Core we’ll just lower the bar with state developed test.

    So, y’all come! Come recruit for teachers in NC. We have some great teachers, but not for long. CB

  13. Slammy on May 2, 2014 at 9:27 am05/2/2014 9:27 am

    • 000

    These 7 challenges are all about testing. Isn’t learning important? Are we implementing education standards for testing standards? I thought that the goal is to prepare kids for college and career success. If these are the top challenges, it sounds to me like the goal is to prepare kids for testing success.

  14. Morgan Polikoff on May 2, 2014 at 7:29 am05/2/2014 7:29 am

    • 000

    I’m pretty sure that Smarter Balanced’s rules require that all participating states set the same proficiency cut score (but not necessarily use that definition for the same decisions).

  15. Mike McMahon on May 2, 2014 at 7:27 am05/2/2014 7:27 am

    • 000

    This column could have been written in the late 90s, when California adopted its own set of standards and accountability. Unfortunately, the Federal government came along changed the rules for receipt of Federal education dollars which was based on whatever State’s definition of proficiency. Given the new dynamics of national common standards, allowing states to set their own definition of proficiency is going to be interesting.

  16. Morgan Polikoff on May 2, 2014 at 6:59 am05/2/2014 6:59 am

    • 000

    Some of the “contrived” questions seem to be well in line with what the standards call for (conceptual understanding, multiple approaches including the traditional algorithm), and some do not. I agree that Common Core supporters need to do a better job arguing why conceptual understanding is important. There have been several good pieces on that recently, one on Vox, and a bunch written by math teachers from around the country.


    • Paul Muench on May 2, 2014 at 12:43 pm05/2/2014 12:43 pm

      • 000

      One way to communicate the importance of conceptual understanding more convincingly is to rely on the English language and standard mathematical notation. That way people have a chance at identifying a problem as a practical test of understanding.

  17. Paul Muench on May 2, 2014 at 5:36 am05/2/2014 5:36 am

    • 000

    The biggest complaints about Common Core have been on the contrived nature of the mathematics questions. The test makers seem to have relied heavily on introducing unconventional notations to introduce an air of critical thinking. Hence the frequent claims that Common Core is the “new new” math. So I’d say the first challenge to convincing parents is making sure the tests test something meaningful.

Template last modified: