Overusing test for special ed students inflates API scores

 Doug McRae

California’s 2012 Academic Performance Index (API) results, released today, in general show small but steady gains similar to the last four years. But a deeper look at the results shows not only inflation contributing to the gains but also a substantial policy shift toward lower expectations for special education students in California.

The API trend data inflation is due to the introduction of a new test for special education students over the past five years: the California Modified Assessments, or CMAs. These tests were introduced to give selected students greater “access” to the statewide testing system, by making tests easier than the regular California Standards Tests (CSTs) given to all other students. When the CMAs were approved in 2007, the plan was that roughly 2 percent of total enrollment (or about 20 percent of special education enrollment) would qualify to take CMAs instead of CSTs. A major criterion for taking a CMA rather than a CST was that a special education student had to score Far Below Basic or Below Basic on a CST the previous year; the decision whether a student should take a CMA or a CST was left to each student’s Individual Education Program (IEP) team.

Over time, however, the implementation of the CMA program has resulted in almost 5 percent of total enrollment (or close to 50 percent of special education enrollment) taking the easier CMAs. In addition, CMA scores count the same as CST scores for API calculations, even though the state Department of Education acknowledges that the CMA is an easier test. The result has been to inflate reporting of API trend data over the past few years, and more importantly to cause a subtle but substantial lowering of academic standards that we expect for our students with disabilities in California.

Alice Parker, a Sacramento-based national consultant on special education issues, comments “California has more than 600,000 students identified for one of 13 disability categories. Of that number, more than 70 percent are students in disability categories who have average or above average intellectual capabilities, such as a specific learning disability or an emotional disability or an orthopedic impairment. These students should be held to high academic standards and should be tested as any other student with average or above average intellectual ability. To assign these students to easier tests or to opt these students out of an accountability system based on high expectations does major harm to each and every student capable of achieving the higher standards.” Indeed, students who receive “higher” scores due to easier tests will likely not receive the appropriate instruction needed to maximize their learning capabilities.

To dig deeper into this, let’s first look at the data from our statewide testing system for the time period since CMAs were initiated, and then talk about the policy shift in standards for California special education students.

The data

Table 1: Students taking CMA by year & grade span

Table 1: Students taking CMA by year & grade span (click to enlarge)

First, we can look at the CMA participation rates from 2008 through 2012 (see Tables 1, 2, and 3). For these data, we see the following:

  • CMA usage has increased rapidly from 40,000 students when initially introduced for grades 3-5 in 2008 to almost 210,000 students for grades 3-11 in 2012.
  • For 2012, these data translate into 4.9 percent of total enrollment taking CMAs, far greater than the 2 percent goal; they also translate into 46.4 percent of special education enrollment, far greater than the 20 percent goal.
  • When one isolates CMA participation for grades 4 through 8, the percentages are larger: 5.9  percent of total enrollment and 52.3  percent of students with disabilities enrollment.
Table 2: 2012 CST, CMA and CAPA participation rates as a percentage of total enrollment

Table 2: 2012 CST, CMA and CAPA participation rates as a percentage of total enrollment (click to enlarge)

  • For grades 4-8, administration of CMAs to students with disabilities far outweighs administration of the more rigorous CSTs, contrary to anticipation when the CMAs were approved in 2007. Clearly, something has gone awry during the implementation of the CMAs over the past five years.

Second, we can look at how the CMA program has affected the reporting of statewide assessment program results (see Table 4). These results involve percentages of students scoring Proficient and Above on the CSTs. When more than 200,000 special education students with low scores on a CST are removed from the calculations, the reported percents increase artificially.

This factor is easily understood as simply taking low-scoring students out of the calculations, and – bingo! – the averages for the remaining students go up.

Not to identify this contributing cause for increasing results is disingenuous. The data show that reported statewide assessment program results are inflated by about 25 percent over time.

Table 3: 2012 CST, CMA, and CAPA participation rates as a percentage of enrollment of Students with Disabilities

Table 3: 2012 CST, CMA, and CAPA participation rates as a percentage of enrollment of students with disabilities (click to enlarge)

Third, we can look at how the CMA program has affected the reporting of API results (see Table 5). For this analysis, we only have data from the elementary school and middle school grades; while CMAs were finalized for the high school grades in 2011, the use of CMAs at the high school grades has not fully matured as yet. These data show that the gains in API scores reported over the past five years have been inflated by 15 points or 39 percent for elementary schools and 12 points or 27 percent for middle schools.

Again, the reporting of inflated API trend data is disingenuous. Also, by giving the same weights to CMA and CST scores for API calculations, the accountability system provides motivation for districts to administer more of the easier CMAs to artificially boost API results.  This motivation may explain at least to some degree the overuse of CMAs at the district and school level.

Finally, it is interesting to look at CMA participation rates by local districts.(Click here for a breakdown of the CMA participation rate of 412 districts in 19 counties.)

Table 4: STAR Reported gains vs STAR gains adjusted for inflation

Table 4: CST reported gains vs CST gains adjusted for inflation (click to enlarge)

These data show some extreme cases of very high use of CMAs by some large local districts, as well as cases of moderate use by other large districts:

  • San Bernardino administered CMAs to 76 percent of their students with disabilities enrollment; Fresno, Desert Sands, Santa Ana, Sweetwater High School District, Corona Norco, and Palmdale all administered CMAs to between 70 and 74 percent of special education students.
  • San Ramon Valley, Clovis, Downey, Irvine, Chino Valley, and Poway all administered CMAs to less than 30 percent of their students with disabilities.
  • At the county level, Riverside County districts administered CMAs to almost 7 percent of their total enrollments, or 63 percent of their special education enrollments, more than 3 times the anticipated rates.
Table 5: CMA inflation effect on API scores in 2012

Table 5: CMA inflation effect on API scores in 2012: This table compares gains in statewide API scores as reported by the state superintendent to re-calculated APIs to adjust for the introduction of CMA scores. It also calculates API inflation attributable to the introduction of CMAs over the past five years (click to enlarge).

These local district CMA participation rates provide compelling evidence that something is haywire. The target is to have 20 percent of special education students take CMAs. To have local districts test more than 75 percent indicates that something other than simple judgment of what is best for the student.  My suspicion is two factors contribute to these data: First, when given a choice to take an easier test or a more rigorous test, human nature gravitates toward easier tests; second, when given an opportunity to boost a district accountability score, adult administrators find a way to tilt individual IEP team decisions in that direction.

Policy shift for students with disabilities

How did California quietly lower expectations for half the special education students in the state? What can California do to address this “under the surface” change for our expectations for special education students?

When California’s current statewide assessment system was initially designed, there wasn’t much attention to separate tests or provisions for special education students. The first discussions for special education students involved defining and implementing various accommodations (alterations in testing formats that do not affect the validity of scores) and modifications (alterations in testing format that do affect validity of scores, such as reading an English language arts test to a student). Then, in the early 2000s, experts agreed that it would be inappropriate to give the more rigorous CSTs to about 10 percent of special education enrollment, or 1 percent of total enrollment – those students with severe cognitive challenges. As a result, a so-called 1 percent test was developed, the California Alternate Performance Assessment (CAPA) targeted for these students. During this time period, policy discussions clearly supported the notion that special education students needed to meet the same academic standards as non-special education students, in order to maximize achievement for special education students.

When the federal government changed its assessment policy for students with disabilities in 2006, it allowed for so-called 2 percent tests, which measured the same academic content standards as the mainstream standards-based tests that were required for No Child Left Behind (NCLB) but had modified achievement standards. This is technical jargon for lower actual achievement levels, or in effect easier tests. The feds indicated these tests should be targeted for only 2 percent of the total enrollment in a state – the next-lowest 2 percent above the 1 percent of severe cognitive disability students targeted for CAPA. About half of the states then set out to design modified tests, or so-called 2 percent tests, for selected special education students. California was among those states, and it took from 2007 to 2011 for the CMAs to be developed and phased in. Unfortunately, California used the same performance category labels for the new CMAs as were used for the more rigorous CSTs, and counted CMA scores the same as CST scores for API calculations. These assessment and accountability decisions have resulted in overuse of CMAs as well as inflated API results.

Other states have handled the introduction of tests for students with disabilities in a better fashion. For example, Massachusetts uses different labels for performance levels for the various versions of their differing tests. Different performance category labels would signal personnel in districts and schools as well as students and parents that the CMA scores are different than CST scores.

Tennessee uses the same labels for modified assessments as mainstream assessments, but uses a different scale score system for the two tests. This strategy is the same as the strategy that California used when CAPAs were introduced in the early 2000s – CAPA has a two-digit scale score system, while our CSTs use a three-digit scale score system. The scale scores appear on individual student reports, thus alerting parents and students and teachers and administrators that the two tests are indeed different.

If California wants to address better use of CMAs in schools, then clearly we should either change the performance level labels or change the scale score metrics, or perhaps change both, in order to better communicate the meaning of the results of these assessments. Also, it would help the IEP teams immensely if they had information on the “impact” of assigning a student to an easier CMA. For instance, if an IEP team had information that assigning a CMA meant that the student only had, say, a 20 percent chance of earning a high school diploma while continuing to strive for the higher standard represented by a CST would give the student, say, a 70 or 80 percent chance of earning a high school diploma, then IEP teams would be less likely to assign CMAs at the rates they do now. This information would simply be truth in advertising for CMAs.

If California wants to address the CMA inflation factor for assessment and accountability results reported each year, this can be done relatively easily. For assessments, it’s a matter of acknowledging the inflation factor associated with CST data when scores from lower-scoring students are removed from the base CST results that are being reported. For accountability, it’s a matter of assigning lower weights for API calculations for CMA vs CST scores, an adjustment to reflect the fact that CMA scores represent lower achievement levels than counterpart CST scores.

While critical of the implementation of CMAs over the past five years, I am not opposed to the CMA as a strategy to get more meaningful individual student test scores for selected students with disabilities. Rather, I am critical of the assessment and accountability practices that have allowed for inflated reporting of annual assessment and accountability results, and fostered gross overuse of CMAs by local districts. Appropriately implemented, the CMA strategy may well be better than the computer-adaptive tests now being proposed to replace CMAs.

In 2011, U.S. Secretary of Education Arne Duncan weighed in on the issue of modified or 2 percent tests for students with disabilities when he declared he would not support modified tests “that obscure an accurate portrait of America’s students with disabilities.” Rather, he said that “Students with disabilities should be judged with the same accountability system as everyone else.” With these statements, Duncan joined others opposing the “soft bigotry of low expectations” that silently plagues many otherwise well intentioned education initiatives.

We should pay attention to viewpoints from those individuals most affected by our statewide policies for students with disabilities. A year ago, when CMA issues were discussed by the Advisory Commission on Special Education, student member Matthew Stacy listened to the discussion and made several powerful statements on behalf of special education students on the topic of statewide tests. “It is unfair not to hold students with disabilities to the same standards as students without disabilities,” he said, adding, “students with disabilities resent being held to lower standards.”

“All that is needed is to make sure students with disabilities have all the necessary accommodations and modifications specified in their IEPs and then hold those students to the same standards as all other students,” he said.

Sometimes it takes the wisdom of youth to cut to the chase and keep adults on target for good assessment and accountability system policies and practices.

Doug McRae is a retired educational measurement specialist living in Monterey. In his 40 years in the K-12 testing business, he has served as an educational testing company executive in charge of design and development of K-12 tests widely used across the US, as well as an adviser on the initial design and development of California’s STAR assessment system. He has a Ph.D. in Quantitative Psychology from the University of North Carolina, Chapel Hill.


Filed under: Commentary, Special Education, State Education Policy, Testing and Accountability


Leave a Comment

Your email address will not be published. Required fields are marked *

Comment Policy

EdSource encourages a robust debate on education issues and welcomes comments from our readers.

  • To preserve a civil dialogue, writers should avoid personal, gratuitous attacks and invective.
  • Comments should be relevant to the subject of the article responded to.
  • EdSource retains the right not to publish inappropriate and offensive comments.
  • EdSource encourages commenters to use their real names. Commenters who do decide to use a pseudonym should use it consistently.
  • Please limit comments to 250 words to prevent comment clutter; if you intend to say more please link out to a place that contains your full comment.
  • Comments with more than one link automatically enter moderation. Comments from new commenters are automatically moderated.
  • Repeated violation of this comment policy will lead to a warning. Continued violations will lead to a ban.

35 Responses to “Overusing test for special ed students inflates API scores”

EdSource does not track who "likes or dislikes" a comment. We only track the number of likes and dislikes.

  1. Mom of teen with SLD on Mar 4, 2014 at 5:11 pm03/4/2014 5:11 pm

    • 000

    As a mother of a child with a specific learning disability (SLD) who has been in the LAUSD system for several years, I cannot agree that students with a SLD should be expected to take the same CST as the mainstream students. I’m sure that Alice Parker means well, but her statement above that students with specific learning disabilities are able to take the same test issued to students without a SLD is a huge over generalization. A significant amount (or even most) of these students, including my son, simply cannot process the information as it is presented in this test, especially under time constraints. My son tried to do this a few years ago and it simply did not work due to his disability. These findings were confirmed by his school and we also had him tested by a licensed neurologist who arrived at the same conclusion. Our school recommended the CMA for testing, and based on my sons learning disability, it is fair and reasonable for them to recommend this. For many students with SLD, a modified test is appropriate and fair and enables them to access the material.

  2. navigio on Sep 4, 2013 at 8:18 am09/4/2013 8:18 am

    • 000

    FYI, CMA rates continued to climb this year, though the increase was only about half of what last year’s was. This means the lower results this year are even despite that increase.

  3. Navigio on Oct 28, 2012 at 1:11 pm10/28/2012 1:11 pm

    • 000

    Btw, one question for people. What is driving this? Is the thought that district repeesentation on the IEP committee is pushing this either with the resulting inflation as a goal or simply because they feel it might be more appropriate for the child? Or is this simply the collective result of many individual decisions that has something other than inflation as the goal?

  4. Replies

    • Dennis on Oct 24, 2012 at 8:58 am10/24/2012 8:58 am

      • 000

      Hi Katy – First, I want to caution that my calculations are estimates and that there shouldn’t be too much emphasis placed on the absolute numbers. With that disclaimer, I agree that the API scores have been inflated since the introduction of the CMAs. I don’t think there’s a cumulative affect (Doug – please correct me if I’m wrong), but 12 points (actual API includes the SCF so the difference is 801 – 789) is still a big deal. For the SFUSD, 12 points would correspond to an “inflation effect” of 50%.

      • Doug McRae on Oct 24, 2012 at 11:17 am10/24/2012 11:17 am

        • 000

        Dennis, Katy — Sounds like the API/CMA issue is getting a workout in San Francisco. Good — one of the functions of accountability data is to surface more substantive issues for policymaker attention. In this case, the larger issue is whether a good portion of the special education enrollment is receiving instruction based on presuming competence vs instruction on a lower set of standards.

        Dennis, concerning cumulative effect, I suppose it depends on how one defines “cumulative.” Clearly, when the numbers and percentages of special education kids taking CMAs increased dramatically during the initiation years from 2008 to 2011, the “inflation effect” was increasing from year-to-year. If one looks at Table 1 in my commentary, the increases by grade groups becomes very apparent, and it’s not at all clear that the increasing pattern statewide is finished as yet. It is easy to project another 5-15% increase in the number and percent of CMAs administered statewide in 2013, if there is no intervening change in policy. It may be quite interesting to produce a Table 1 chart for SFUSD data, both district-wide and perhaps by various subgroups, to ascertain whether the increasing pattern is now likely finished for SFUSD or perhaps still has more increase before stability is reached. In think the CMA usage data is the primary variable affecting what happens to the API inflation effect due to CMA over time.

        Looking at the issue strictly from an API calculation perspective, the inflation due to CMA effect from year-to-year will likely abate simply due to stabilizing of the CMA administration counts, with more years of apples-to-apples comparisons [i.e., CMA to CMA for stable cohorts of kids over time] in effect masking overall inflation due to the introduction of the CMAs. The tough thing for policymakers to do will be to acknowledge the negative effects of overusing CMAs and reverse the numbers and percentages of kids taking CMAs over time, because such an appropriate correction will have the effect of deflating API increases for a few years just like the overuse of CMAs artificially inflated API increaes in the recent past.

  5. navigio on Oct 22, 2012 at 3:46 pm10/22/2012 3:46 pm

    • 000

    Btw, the API calculations are done using non-SWD CST results independently of SWD results (CST, CMA and CAPA) for all grades besides 2nd. And districts have the data to do this on a per-ethnicity basis (though CDE does not publish data in such a way to allow the public to do this (??)).

    I have found the differences between non-SWD and SWD API values to be staggering, especially for minority groups and especially in urban or high-poverty secondary schools. That in spite of the CST to CMA shift described here.

    It is troubling that our publicly released API data does not publish this breakdown, and instead lumps all results for an ethnicity into a single ethnic subgroup API score. In spite of the fact that we not only have disproportionate SWD classification rates for minorities, but also a disproportionate rate of CMA participation.

    fwiw, I did a similar breakdown to katy’s. While the differences were not so stark, they were still there. The rates of SPED CMA takers by ethnicity (and percent of total ethnic group, both swd and non in parens) was:
    AA: 48% (9%)
    Hisp: 42% (%5)
    White: 35% (5%)
    Asian 35% (3%)

    I also found that the rate over the years did not increase too much when only looking at the grades for which CMAs were given. Ie the disproportionality was more or less maintained from the beginning, but the total number of students moved out of the CST was obviously increased as the number of grades to which it was given was expanded.

    More troubling were the differences between schools. We had schools below both the 2% overall and 20% SWD targets, but also had schools that went as high as 14% and 66% for those same measures (and those may have even been underrepresented because of how CAPA is reported). That was not broken down by ethnicity, so if the disproportionality mentioned above was consistent, it would mean some ethnic rates that were astronomical. And in fact, a number of individual grades at some schools had well over 80% of SWD students taking the CMA.

    But anyway, I guess we will continue to get districts talking about ‘the achievement gap’ without making these distinctions as well as lauding our schools for continually raising their APIs…


    • Doug McRae on Oct 23, 2012 at 8:22 am10/23/2012 8:22 am

      • 000

      I agree, Navigio. I think the differences between schools and districts are the most striking, very compelling information that schools and/or districts are using the CMAs in ways not anticipated when the CMAs were approved, and compelling evidence that a policy correction is needed.

  6. Dennis on Oct 22, 2012 at 1:15 pm10/22/2012 1:15 pm

    • 000

    Hi Doug, Nice piece. I’m curious about how you generated the data for Table 5. How did you exclude the CMA results in the API calculation? The CDE spreadsheet for calculating API includes worksheets for SWD and non-SWD. Did you leave the SWD worksheets blank and then incorporate the CMA results elsewhere or leave them out completely? Thanks! -Dennis


    • Doug McRae on Oct 22, 2012 at 2:37 pm10/22/2012 2:37 pm

      • 000

      Dennis — First I used the STAR statewide results released 8/31 to estimate the number of kids scoring Advanced, Proficient, Basic, BB, and FBB for each test contributing to the API, including results for CSTs, CMAs, and CAPAs. The I took the number of CMA kids scoring Advanced, Proficient, and Basic and redistributed them half to BB and half to FBB to get numbers of kids adjusted for CMA. Then I fed those numbers into the CDE’s API calculator on their website. That’s how I got the “adjusted” APIs for Table 5. The actual APIs for 2007/08 were taken from CDE data releases from several years ago, before CMAs were initiated, while the 2012 actual APIs including CMA scores were taken from the API database released Oct 11. Doug

      • Dennis on Oct 22, 2012 at 10:35 pm10/22/2012 10:35 pm

        • 000

        Hi Doug, Thanks again for providing the details behind your calculations. I was thinking that the Scale Calibration Factor (SCF) would provide some calibration between CMA and CST scores. Since the SCF is negative for SWD results and positive for non-SWD results, the same proficiency level on the CMA and the CST would result in a lower API score for the CMA case. I used the 2012 CST & CMA results for a large urban school district and calculated the following API scores:

        1) 801 – CST results entered into non-SWD worksheets (positive SCF) & CMA results entered into SWD worksheets (negative SCF)
        2) 810 – CST results entered into non-SWD worksheets and CMA results thrown out
        3) 805 – CST and CMA results both entered into non-SWD worksheets
        4) 789 – CST results entered into non-SWD worksheets and CMA results entered into non-SWD worksheets with the following mapping: CMA Advanced and Proficient distributed to CST Below Basic; CMA Basic, Below Basic and Far Below Basic distributed to CST Far Below Basic (similar to how you redistributed CMA results)

        So indeed, the SCF does adjust the API score lower for CMA results, but the adjustment is small in my example (805 -> 801, difference = 4). Redistributing CMA results to CST BB and FBB results in a much larger downward adjustment (805 -> 789, difference = 16).

        • Doug McRae on Oct 23, 2012 at 8:19 am10/23/2012 8:19 am

          • 000

          Yes, Dennis, your calculations confirm my analysis on the effect of the Scale Calibration Factor. The SCF does partially correct the APIs for the effect of the initiation of CMAs, but does not provide a full correction and as a result over time the API trend data has been artificially boosted due to the introduction of the easier tests without adequate adjustments to keep API trend data apples-to-apples over time.

  7. Katy on Oct 19, 2012 at 1:01 pm10/19/2012 1:01 pm

    • 000

    It’s all down to this: “Presuming competence is nothing less than a Hippocratic oath for educators.”


    “The principle of “presuming competence,” is simply to act as Anne Sullivan did. Assume that a child has intellectual ability, provide opportunities to be exposed to learning, assume the child wants to learn and assert him or herself in the world. To not presume competence is to assume that some individuals cannot learn, develop, or participate in the world. Presuming competence is nothing less than a Hippocratic oath for educators. It is a framework that says, approach each child as wanting to be fully included, wanting acceptance and appreciation, wanting to learn, wanting to be heard, wanting to contribute. By presuming competence, educators place the burden on themselves to come up with ever more creative, innovative ways for individuals to learn. The question is no longer who can be included or who can learn, but how can we achieve inclusive education. We begin by presuming competence.”

    – Douglas Biklen


    • navigio on Oct 19, 2012 at 1:21 pm10/19/2012 1:21 pm

      • 000

      “By presuming competence, educators place the burden on themselves to come up with […] ways for individuals to learn.”

      That is an extremely important statement, and I think for a natural teacher is simply they way they see the world (in that sense, the term ‘burden’ is the wrong one, its probably actually what makes teaching so inspiring). That said, it should be printed out and hung from every board room and district admin’s office, as well as becoming part of an explicit oath for anyone working within public education. I can dream, right?

  8. Pamela on Oct 18, 2012 at 11:25 am10/18/2012 11:25 am

    • 000

    I found this to be an interesting article. In that what is the real reason for testing? I have a child with Cerebral Palsy who is main streamed with a one on one aide she is cognitively abled 115 IQ so I’ve been told. However in the IEP of 2nd grade–now in 9th–she (I) was given the option for her to be part of the testing I thought to myself this will be great for three reasons 1) she will be more integrated with her peers 2) I will get to see if the school is really meeting her academic needs 3) It will prepare her for real life testing in a non threatening outcome.
    Unfortunately, we never realize that either the aide or the school was filling in the answers because my daughter is/was unable to physically fill in the bubbles perfectly on the test sheet. Until middle school when she had an aide that filled in the bubbles as my daughter had told her. The aide put the test with the rest of the students test to be sent off and corrected, we later found out that she was not anywhere near proficient but quite the contrary of not being able to score on the test because of inefficient answers.
    To make a super long story short we had her tested yet again through an outside agency and found out she was actually reading at a second grade level in 7th grade; because, the school district dropped the ball on her education. Fortunately for our family we were able to get her literacy help (outside of school) and get our daughter on the right educational track. But I will never trust the district again with my daughters education.
    I will continue to have my daughter test for the cst or cmt, but only for the act of testing and nothing more. One thing I wish the state would do for our SpEd students who have occupational needs is to put the test on a digital copy so the students could click on the answers so the staff cannot cheat on students outcome.

  9. Replies

    • Doug McRae on Oct 18, 2012 at 10:27 am10/18/2012 10:27 am

      • 000

      Excellent article, Katy. You were able to show the effects of flawed implementation of CMAs at the local district and subgroup level in a way I could not address with statewide data. I recommend your article to others.

  10. navigio on Oct 16, 2012 at 10:05 am10/16/2012 10:05 am

    • 000

    Hi Doug. I am going to do an analysis for our district and have a couple of questions.
    Your article said the target was 2% of enrollment and 20% of special ed, but of course CMA isnt given to K, 1st or 2nd grade kids. Should the 2% and 20% targets be of enrollment including those non-tested grades? Your charts disaggregated by grade level which of course implies those rates are only of tested enrollmenr. Was the definition of the target clear enough to know which I should use? Thx.


    • Doug McRae on Oct 16, 2012 at 11:05 am10/16/2012 11:05 am

      • 000

      Navigio — The 2% target came from the feds, and it applies to enrollment for whatever grades are involved with modified assessments. Since CA has modified assessments for grades 3-11, I apply it only to those grades. If you are a K-6 district, then I’d apply it to grades 3-6 enrollment; if you are a high school district, then only to grades 9-11 or whatever grades your district has. As an aside, after the feds signaled their OK for the 2% tests in 2006 when they provided details for AYP calculations they allowed up to 2% scoring proficient on a modified test, thus allowing some to conclude the target for test administrations for the modified tests should be greater than 2%. But, special education experts like Alice Parker who is quoted in my commentary would not agree with that side argument based on analysis of overall special education categories and which categories should qualify for modified assessments and which categories should not qualify for modified assessments. But that is an aside for this discussion. The 20% of special education enrollment target is just an extension of the 2% federal target recognizing that generally speaking special education enrollment is about 10 percent of total enrollment. For CA, our special ed enrollment is a notch above 10 percent (tho lower than the national number which is in the 11-12 percent range, I recollect), but if you notice the fine print for the local district charts that are linked, the CAPA Level I kids are not included in estimated special ed enrollment, so that exclusion kinda compensates for using a rough extension to arrive at the 20 percent of special ed enrollment target. One can get overly precise with the numbers and percentages for an exercise along these lines, especially given the natural variation from district to district and grade to grade — it’s best to apply some common sense to the numbers as well as using a calculator . . . . . Hope this helps.

  11. Gary Ravani on Oct 15, 2012 at 2:35 pm10/15/2012 2:35 pm

    • 000

    Of course ,”cut scores” are always subjective and do not come carved in stone from on high.

    Special Ed students get special tests with special consideration in scoring because they have special needs; hence the term “special.”

    “One size fits all” has been a failing strategy since it was memorialized by NCLB, the API, and the AYP. Time to get up to speed on that concept.

    Tests are created by humans and the scoring process is controlled by humans. The cut scores are determined by humans. The tests are then taken by humans and then those scores have consequences for humans. Again, let’s not pretend they are sacred documents of some kind.


    • Doug McRae on Oct 15, 2012 at 4:51 pm10/15/2012 4:51 pm

      • 000

      Gary, the CMA initiative is a good example of something that is the opposite of a one-size-fits-all strategy, and I support the CMA itself. It is the flawed implementation of the CMA initiative that draws my wrath, with flaws that deceive special education students, parents, teachers, and the public and lead to overuse of what fundamentally could be a good policy tool.

  12. Sharon on Oct 13, 2012 at 9:40 am10/13/2012 9:40 am

    • 000

    As a school administrator, I am concerned that students and and parents are given an inflated perception of skills and abilities from the CMA. For example, a score of Basic or above on the CMA would give students/parents the perception that the student was performing on or above grade level when in fact this result would indicate performance that was below-standard.


    • navigio on Oct 14, 2012 at 1:45 pm10/14/2012 1:45 pm

      • 000

      Well, personally I dont even think our CSTs are intended to provide grade-level measures. If they were, the difference state-wide between 3rd and 4th grade ELA proficiency rates wouldn’t be 20 percentage points (or 10 between 2nd and 3rd). Our standardized tests seem more a way to merely differentiate kids from one another, independent of the ‘standard’.

      • Doug McRae on Oct 15, 2012 at 8:14 am10/15/2012 8:14 am

        • 000

        The CSTs are standards-based tests designed to ascertain achievement on content standards, and there is nothing in the tests themselves that would prevent all students from scoring proficient or advanced or whatever thus providing no differentiation at all among kids. But, the reality is that learning behavior itself is differentiated, and thus a a by-product of that reality scores do differentiate among kids though that is not the intent of the test. Indeed, we have examples of entire groups scoring advanced . . . in the extreme we’ve actually had schools with APIs of 1000 meaning every kid in the school scored advanced on every test . . . . that is an extreme example of no differentiation among kids.

        Your citing of the different proficiency rates for 2nd to 3rd to 4th grade ELA tests is, however, a problem with the STAR CST test series itself. It even has a name among folks who follow STAR things closely, the grade 3 ELA “anomaly.” It has been known since the CSTs were first introduced in 2003, at first causing head scratching on why 3rd graders score so much lower than 2nd and 4th graders, and after two years it was the subject of a 3-day “task force” meeting involving a group of outsiders along with CDE and ETS staff. A few recommendations came from that effort, but none that corrected the anomaly. I was involved in that task force effort in Oct 2005, and my own view is that there was a flaw in the equating of the CST for grade 3 to the predecessor Stanford Achievement Test for ELA for grade 3 when the transition from SATs to CSTs was made in 2003, a flaw that was never corrected. In any event, the grade 3 ELA anomaly is still with us, and it is the cause for the 10 point discrepancy in proficiency rates between grade 2 and grade 3, and the 20 point discrepancy between grades 3 and 4, and CA has simply lived with this anomaly for the past 7 years without further investigation as to its cause or any further corrective action.

        • el on Oct 15, 2012 at 8:56 am10/15/2012 8:56 am

          • 000

          They are standards-based, but the scores are normed and the tests are not static with static cutoffs year to year.

          While people may be fine with an individual (well-heeled) school scoring 100% proficient, you can bet that for the state as a whole, there is no way the people who oversee the tests would allow 100% proficiency statewide… obviously, such a result would indicate a test with a lack of ‘rigor’. Questions that too many kids get correct are dropped rather than celebrated as an example of student learning. The cutoff scores are determined AFTER the kids take the test, not before.

          Regardless of the official motivation, the people who make and score the tests have a strong incentive to keep that differentiation alive, because if all the kids scored proficient, why would the state continue to spend money administering the tests? And certainly there’d be no incentive to buy more curriculum.

          The public sees these scores and think ‘man, school is much worse than when I was a kid’ but they have no idea how much higher our expectations are of the kids at each grade than they used to be.

          • Doug McRae on Oct 15, 2012 at 9:43 am10/15/2012 9:43 am

            • 000

            EL — The CST scores are not normed. The tests have not been static (i.e., items have been retired from year-to-year and replaced by new or previously used but still secure items), but every reasonable attempt has been made to keep the cutoff scores static from year-to-year. Cut-off scores are set after the initial first administration of a test, and then maintained to be static from year-to-year, rather than reset every year.

            I wouldn’t agree with your comment on the motivations of people who make or score or oversee the test, but I would agree that in general the public is not informed how much higher the expectations are than they used to be. The general public also is not well informed how much higher K-12 achievement levels are in other states or in other countries, or how California’s K-12 enrollment differs significantly from other states and other countries.

        • navigio on Oct 15, 2012 at 10:03 am10/15/2012 10:03 am

          • 000

          Hi Doug. First off, I appreciate your detailed answer, and I apologize for not being careful enough in the words I chose.

          While I agree that the test is not designed to striate an arbitrary set of students (not what I meant to imply, sorry), it does clearly do this at the state level. But my response was more intended to challenge the notion that a particular classification band (on either CST or CMA) is somehow indicative of grade level. It would be nice if it were, but I think the 2nd-3rd-4th grade anomaly makes it clear its not (or alternatively, that the standards are perhaps what are not mapped properly to grade level, which would be even more problematic). Even then, the whole notion of introducing a new test (CMA) should make it clear that ‘grade level’ can have multiple meanings given the context. Sadly, the media uses ‘grade level’ all the time, and generally seems to equate it to proficiency or above band.

          It is interesting that this was noticed, met about, and then left alone. :-) I know parents who see the 3rd grade drop and think, ‘oh no! Johnny is slipping.’ While Johnny may in fact be slipping, understanding that the drop is, in some sense, ‘expected’, puts things in a much different light. Ironically, in our district, we had a few schools that had humungous drops in 3rd grade this year (but increases in all other grades). Knowing these schools, I think the drop was largely a demographic one based on star alignment, but I also think the ‘expected drop’ exacerbated the results and really has a lot of people scratching their heads.

          FWIW, I think living with the anomaly has reduced the credibility of the CST. But that’s just me. :-)

          • Doug McRae on Oct 15, 2012 at 11:03 am10/15/2012 11:03 am

            • 000

            Navigio — OK, I agree re challenging the notion that somehow a score on a CST is meant to be indicative of “grade level.” We’ve had scores from national norm-referenced tests for more than 50 years, so called grade equivalent scores, that have been widely misinterpreted as “grade level” scores and now we have scores from individual state standards-referenced tests misinterpreted as “grade level” scores. To my knowledge, no one responsible for developing a widely used K-12 test has ever defined the scores from that test as a “grade level” score. Also, I agree that the de facto decision to live with the STAR CST grade 3 ELA anomaly has hurt the credibility of the CST system. My preference was (and is) to address the anomaly and correct it.

          • Ze'ev Wurman on Oct 15, 2012 at 11:30 pm10/15/2012 11:30 pm

            • 000


            Doug already explained most of the details. Where I slightly disagree is with regard to the “grade level” meaning for STAR. I feel that this term DOES have a meaning, given that the test cut-scores were originally established using the “bookmark” procedure. It basically captured what a group of experts time felt are appropriate cut scores for *that* particular test at *that* time. Once established, these cuts scores have not been updated (or reset) over time and if students improve over time, there is nothing preventing most/all students from scoring proficient.

  13. Katy on Oct 12, 2012 at 4:02 pm10/12/2012 4:02 pm

    • 000

    Yes, SFUSD press releases make my teeth ache. They claim “double-digit gains in achievement” but do not account for all the students who no longer take the CST. Obviously when you remove the lowest scoring students from the equation, and do not even include them in the calculations anymore, the “percent proficient or above” looks much better.

    As Bill Clinton said: “it ain’t hard — it’s ARITHMETIC!”

  14. Doug McRae on Oct 12, 2012 at 3:42 pm10/12/2012 3:42 pm

    • 000

    EL: The CMAs do have only 3 distractors compared to 4 on the CSTs, as well as larger print and more white space. But those are only surface characteristics for the tests. The real difference between the CMAs and CSTs is the placement of cut scores for the performance levels — if both tests were placed on a common scale of measurement, then Proficient on a CST would higher on that achievement scale than Proficient on a CMA — for example, Proficient on a CMA might be somewhere around Basic on a CST. We don’t have good quantitative estimates for exactly how the two tests compare in terms of performace levels. The study you suggest whereby CMAs and CSTs are administered to the same students would provide such estimates, but such studies have never been done. Such studies, or perhaps other ways to estimate the comparability of scores from CMAs to CSTs, would have been good test development protocol for the CMAs, but they were not conducted so now policymakers don’t have the appropriate information for decisions how best to treat CMA vs CST scores for (say) accountability system calculations. One exception is the comparability work done by CDE staff for the CAHSEE alternative means issue — using CAHSEE passing scores for a common scale of measurement, CDE staff were able to estimate what score on a CST was equivalent to a CAHSEE passing score, and what score on a CMA was equivalent to the same CAHSEE passing score, and combining that information we can estimate how a CMA score compares to a CST score. But that information is available only for Grade 10 E/LA and Algebra, and we don’t know whether that information generalizes to E/LA tests for other grades or other math tests. If we had comparability information for the entire set of CMA tests to all counterpart CST tests, we’d be in far better shape to design a good research based solution to the situation outlined in my commentary.


    • el on Oct 12, 2012 at 11:02 pm10/12/2012 11:02 pm

      • 000

      I didn’t realize they monkeyed with the cut scores – that obviously would dramatically affect how those scores are used… and seems a little, um… let’s go with, “difficult to defend” … after they went to all the trouble to put all the same standards on the test.

  15. el on Oct 12, 2012 at 2:12 pm10/12/2012 2:12 pm

    • 000

    I have questions about the nature of the CMA test. I understood that it had three possible answers for each question instead of 4, but that the questions themselves were similar. This would have the effect of (a) less time needed to do each question (b) less opportunity for confusion and (c) a higher bottom score given the nature of random guessing of answers. If my understanding is correct, it is (c) that is the most inappropriate skew.

    Has anyone ever done a study where the CST and the CMA were administered to the same students (with ranges over all the 5 rankings) and their scores were compared?

  16. Doug McRae on Oct 12, 2012 at 2:03 pm10/12/2012 2:03 pm

    • 000

    Navijio — Thank you for the compliment. On a couple of your points:

    Clearly, the district level CMA participation charts are the most striking of all the data I crunched for this analysis. Much of the statewide data tends to mask the effect of CMAs on CST and API aggregate data over time. If we could do school-by-school CMA participation charts, as you kinda suggest, I’m sure there would be even more striking data on who overuses CMA and who doesn’t, but for many schools there isn’t enough data to support meaningful stats for generalizations. If any reader wants to get an idea what this post is all about, they should go to the local district CMA participation chart for a familar county, and focus on the range of use of CMA from district to district. Such a look will cause a lot of head scratching.

    It occured to me that CMA may be an incentive to overclassify low scoring students into special education, but like you I have no way to investigate that potential. It also occured to me that using the CMA to manipulate APIs is similar to the Texas Supt now in the slammer for actively excluding low scoring kids from his high schools, but the CMA situation is much more nuanced and less blatant. Unfortunately, I think a lot of people in the trenches don’t have a good idea what the CMA is for . . . instead, they just see an easier test and go in that direction without considering potentially adverse consequences for the student.

    I did adjusted APIs at the statewide level only, via hand calculating data from the STAR statewide results and feeding then feeding numbers into the API calculator spreadsheets on the CDE website. To get ajusted APIs for districts and/or schools, one would have to have better computer skills than I have been able to maintain in recent years . . . . I’ve been retired for a bit now, and the AARP doesn’t provide good computerized spreadsheet or other data manipulation support . . . . (grin).

    Finally, on your parenthetical about the Algebra grade 8 issue, I would applaud the Supt who held 8th graders out of Algebra I if he or she thought students were not ready. Indeed, one of the flaws of the 15 year old initiative to increase Algebra I for 8th graders has been the lack of a good diagnostic placement tool for assisting with the 8th grade Algebra decision. Widespread use of a good diagnostic placement tool for grade 8 algebra would take a lot of steam out of the grade 8 algebra debate — the current problem with that debate is the assumption that it has to be one-size-fits-all rather than an acceptance that some kids are good to go for algebra by grade 8 while others need a helping of pre-algebra. A good tool to help make those decisions for individual kids would solve many of the inappropriate decisions made in the context of a one-size-fits-all mentality.

    Again, thanx for your comment.

  17. navigio on Oct 12, 2012 at 10:36 am10/12/2012 10:36 am

    • 000

    Doug, very much appreciate your analysis and EdSource for giving you the forum. We need (much) more of this level of analysis if we are going to continue to use test data as a ‘measure’ of our education system. A couple of additional points:

    When John first mentioned your analysis a couple months ago, I went looking at various participation rates in different schools and noticed that some schools clearly make a point of sticking to the CST while others clearly make a point of moving toward the CMA. This, of course, introduces the same inflation, at the expense of the school who is choosing the CST because they think it is more appropriate. In other words, its a penalty for doing the right thing. (On a semi-related note, I noticed the super of the highest scoring district in the state mentioned that they actively decided to have some 8th graders hold off on their Algebra 1 CST because they wanted them to be ready rather than take it too soon. He also acknowledged that this policy will negatively impact their API, but that they felt it was the right thing to do. He said, “We’re not going to let the tail wag the dog for test scores. We’re proud of being No. 1, but also proud of the way we got there,”)

    I would also point out that the ability to use the CMA as a way to inflate API scores creates an incentive for policy that over-classifies lower-achieving students as special ed. I dont know whether this is actually happening, but I would not be surprised. Especially given that over-classification of those subgroups is a known state-wide problem.

    I would even go one step further and point out that a Texas Superintendent was recently sentenced to 3 1/2 years in prison for actively counseling low performing student out of high school altogether with the explicit intent of improving his district’s accountability scores. Not is it only ironic that that actually used to be policy decades ago, but that the manipulation of test procedures mentioned above seems very much to be an analogous type of fraud.

    And although I understand the intent of trying to have a specific measure for those for whom the standard CST is not appropriate, I think it will be impossible to eliminate this type of manipulation as long as there is a difference. Especially given how our definition of special education has expanded over the past few decades.

    Finally, I think it would be helpful to see an actual listing of some APIs and ‘adjusted’ APIs. From what I can tell, based on the average calculated inflation, our schools have improved only about half as much as the reported API makes it seem. Well, that only for people who think API means something…

Template last modified: