State’s largest urban districts post gains on national assessment

Manuel 10 years ago10 years ago
BTW, do Governor Brown’s recent remarks about standardized testing (he thinks that national education standards are “just a form of national control”) have any role on this conversation about the validity of this “notable progress?”

Where was he in the headlong rush to Common Core and SBAC?
Replies
- Doug McRae 10 years ago10 years ago
  Manuel: The governor's views on standards and testing [provided your description of his view that national standards are just a form of national control is a accurate description of his view] are indeed a bit independent of most Dem governors across the country, and rather more in concert with many Rep governors. The criticism of the common core from the tea party right focuses mostly on the fear than common core standards are or … Read More
  Manuel: The governor’s views on standards and testing [provided your description of his view that national standards are just a form of national control is a accurate description of his view] are indeed a bit independent of most Dem governors across the country, and rather more in concert with many Rep governors. The criticism of the common core from the tea party right focuses mostly on the fear than common core standards are or can be a form of national control, particularly over curriculum and instruction at the local level, potentially via high stakes tests influencing curriculum and instruction at the local level. As a testing guy, I share the governor’s concerns along these lines — I think it’s a bad use of statewide tests to directly influence local district curriculum and instruction in a particular direction, bad for good curriculum and instruction and bad for large scale testing. Yet, this exactly seems to be the direction that California is headed via acceptance of the Smarter Balanced consortium tests.
  
  FYI, the Gov was Atty Gen when common core was adopted, but he was Gov when CA joined SBAC. My guess is if he knew that Smarter Balanced assessments would be a violation of his principle of subsidiarity (or local control as much as possible, particularly for curriculum and instruction choices), my guess is he would not have signed the letter joining the Smarter Balanced consortium in June 2011.
  - Manuel 10 years ago10 years ago
    Doug, "All I know is what I read in the papers." Governor Brown's remarks were carried by the LA Times and by the Sacramento Bee. The remarks were part of "an on-stage interview Monday with the Atlantic magazine's James Bennet at the Computer History Museum" in Mountain View. As for his relative silence until now, he could have made his opinion known when he was the AG. My guess is that when he signed the SBAC letter he … Read More
    Doug, “All I know is what I read in the papers.”
    
    Governor Brown’s remarks were carried by the LA Times and by the Sacramento Bee.
    
    The remarks were part of “an on-stage interview Monday with the Atlantic magazine’s James Bennet at the Computer History Museum” in Mountain View.
    
    As for his relative silence until now, he could have made his opinion known when he was the AG. My guess is that when he signed the SBAC letter he knew that the horse had left the barn and just bid his time. Now there are no CSTs and the SBAC is a couple of years down the line. Maybe he wants to see if the schools will collapse without standardized testing. When that doesn’t happen, he will probably say I told you there’s no need for testing. But that’s just a guess, a judgment call, if you will 😉
Gary Ravani 10 years ago10 years ago
"Not everything that can be counted counts, and not everything that counts can be counted." Albert Einstein "Not everything that is 'statistically significant' is actually significant in real world terms." (I made that up.) Since the nation has embarked on standards and test based accountability in 2002, and CA nearly a decade before that, there has been little in the way of "significant" progress particularly for the poor and minority population even within the limited scope of what … Read More
“Not everything that can be counted counts, and not everything that counts can be counted.”

Albert Einstein

“Not everything that is ‘statistically significant’ is actually significant in real world terms.”

(I made that up.)

Since the nation has embarked on standards and test based accountability in 2002, and CA nearly a decade before that, there has been little in the way of “significant” progress particularly for the poor and minority population even within the limited scope of what can be measured by various tests.

As Yogi once said, “When you come to a fork in the road, take it.” Well, we did and it was demonstrably the wrong policy fork. It could be said that it is time to “re-think” educational policy choices if it were true that there was a lot of thinking that went into the policy choices the first time. Of course, there was not. Just a lot of finger pointing and attempting to hold people “accountable” who have little control over the real conditions that create low achievement coupled to campaigns to divert public attention from the policies that could result in improved learning. These include closing the, school funding gap, the preschool gap, the affordable housing gap, the healthcare gap, the living wage gap and all the other gaps that, cumulatively, create the “achievement gap.” Well, its a new year. There is always hope and the chance for a new campaign for quality public education. Cheers!
Kathy Baron 10 years ago10 years ago
Manuel, I spoke with a psychometrician at the National Center for Education Statistics and asked how they define statistically significant results. He said it's not a single measure. It includes the sample size and the percentage of low income, English learner, special ed, and racial/ethnic minority students; the size of the district; how high the gains were overall and within each subgroup; and how the gains compare to the state and national results. You can … Read More
Manuel,
I spoke with a psychometrician at the National Center for Education Statistics and asked how they define statistically significant results. He said it’s not a single measure. It includes the sample size and the percentage of low income, English learner, special ed, and racial/ethnic minority students; the size of the district; how high the gains were overall and within each subgroup; and how the gains compare to the state and national results.

You can see some demographic information here:
http://nationsreportcard.gov/reading_math_tuda_2013/files/Tech_Appendix_2013_Math_TUDA.pdf
At top of the page there’s a pull-down menu to select the district, grade and subject.
Replies
- navigio 10 years ago10 years ago
  IMHO a psychometrician should be able to tell you off the top of his head what statistically significant is for each of these districts and subgroups. When I become one, I’ll let you know. 😉
  - Doug McRae 10 years ago10 years ago
    Navigio: You overestimate what a psychometrician can do "off the top of his head." A good psychomatrician may be able to provide a "back of a napkin" estimate for statistical significance based only on sample size [also called a "poor man's statistical estimate" by a really high powered pure statistician I got to know in graduate school], but the kind of precise statistical significance calculation done by NCES / NAEP folks for TUDA data … Read More
    Navigio: You overestimate what a psychometrician can do “off the top of his head.” A good psychomatrician may be able to provide a “back of a napkin” estimate for statistical significance based only on sample size [also called a “poor man’s statistical estimate” by a really high powered pure statistician I got to know in graduate school], but the kind of precise statistical significance calculation done by NCES / NAEP folks for TUDA data cannot be done on an off-the-top-of-my-head basis. When you become one, you’ll understand why [grin].
    - navigio 10 years ago10 years ago
      
      But Doug, I do them all the time and I'm not even a psychometrician yet. ;-) Kidding aside, she spoke with a psychometrician from the national center for education statistics, not just some random statistician. It seems surprising that one of them would have no feel for the variables needed in conjunction with sample size for something like NAEP. Anyway, I expect that information is in the study itself. I've only been on my phone today … Read More
      But Doug, I do them all the time and I’m not even a psychometrician yet. 😉
      
      Kidding aside, she spoke with a psychometrician from the national center for education statistics, not just some random statistician. It seems surprising that one of them would have no feel for the variables needed in conjunction with sample size for something like NAEP. Anyway, I expect that information is in the study itself. I’ve only been on my phone today so I’ll go read that later.
- Manuel 10 years ago10 years ago
  Kathy, thank you for the reference. Here's my problem: "notable progress" is a very relative term because no context is given until one takes a look at the provided graphs. But the average person out there would not classify a less than 2% change on anything as "notable progress." Anyone getting a 2% pay raise, for example, will never call that "notable progress." 5% maybe, 10%, definitively. Then we get into statistics. A psychometrician's job is to … Read More
  Kathy, thank you for the reference.
  
  Here’s my problem: “notable progress” is a very relative term because no context is given until one takes a look at the provided graphs. But the average person out there would not classify a less than 2% change on anything as “notable progress.” Anyone getting a 2% pay raise, for example, will never call that “notable progress.” 5% maybe, 10%, definitively.
  
  Then we get into statistics. A psychometrician’s job is to design tests that can be used to compare populations both in time and across cohorts. To do that, the most common tool is designing a test whose results will match the Bell Curve as much as possible. If that’s indeed the case, then the average must be stable because if it isn’t, then you can’t compare populations. Yes, Doug may come by and tell us that “growth” is allowed but after 10 years of CSTs the number of kids not proficient always hovered around 50%, which is not surprising since the average was where the proficient cutoff point was set at.
  
  What this means is that the NAEP test, which ought to also be Bell-Curved not “criterion-reference,” probably has the same issues: the average value will fluctuate but will never show large swings. (If it did, it would not be a well designed test.) Hence, Duncan, Deasy, et al, will talk about “notable progress” and “be elated” and declare victory when 2% positive changes to the average happen. (Please note that the change in the national average is even smaller, but you don’t hear Duncan bemoaning that there is no growth nationally!)
  
  Unfortunately, no psychometrician that wants to keep his/her job is going to raise these issues. That is why they like to “take the fifth” and say it is not a single measure. But if that is the case, why call this almost-meaningless fluctuations “notable progress?”
  
  Anyway, I did look at the pdf and I’d be willing to bet that the sampling for LAUSD is mostly Latino kids who are poor. Why do I state this? Because the latest enrollment figures show that roughly 10% are white, 10% African American, and less than 10% Asian Americans. Plus roughly 76% of students qualify for a free lunch. With those demographics, it is not surprising that LAUSD has a relatively lower average than the nation. But what else is new?
  
  BTW, I hope you note that my disappointment is with how officials can turn even the most minute “positive” change into a major cause for celebrations. Unfortunately, you have to report it and, unless you can find a Deep Throat in the testing industry, you won’t be able to include contrarian opinions. Unless, of course, you listen to non-experts such as me (or navigio!) 🙂
  - Doug McRae 10 years ago10 years ago
    Yup, Manuel, I'm coming by to repeat the information that the Bell Curve or normal distribution characteristic of test scores is a reflection of human behavior (in this case, spread of academic achievement) rather than type of test. NAEP actually has a real history of being a criterion-referenced test -- from the late 60's when NAEP was first introduced to the mid-80's, all interpretation was based on item or small clusters of items data, no … Read More
    Yup, Manuel, I’m coming by to repeat the information that the Bell Curve or normal distribution characteristic of test scores is a reflection of human behavior (in this case, spread of academic achievement) rather than type of test. NAEP actually has a real history of being a criterion-referenced test — from the late 60’s when NAEP was first introduced to the mid-80’s, all interpretation was based on item or small clusters of items data, no total scores at all, which is one of the characteristics of true criterion-referenced testing. But, since the mid- or late-80s, we have had some sort of total scores from NAEP, for the last 20 years we’ve had state-by-state NAEP results, and for the last 10 years we’ve had NAEP data for self-selected large urban districts. In all of these cases, the score cutoff’s remain constant over time, and thus allow for gain or growth measurement over time. And folks can do the necessary computations to determine whether the gains are statistically significant or not. But as Gary notes below, not everything that is statistically significant is educationally meaningful, and it is a judgment call (not a statistical call) whether or not 2 or 4 or 6 or whatever point gains are “notable” or not.
    
    I’d also take issue with your comment that, for CA’s STAR CSTs that “after 10 years the number of kids proficient always hovered around 50 percent.” That’s not true, CA growth data over time shows average percent proficient mostly in the 30’s back in 03 with gains over time for most CSTs to greater than 50 with many over 60 percent proficient by 2013. The real question for growth measurement is not arguing over one or two points being meaningful from year-to-year, but rather whether trend measurement over multiple years shows meaningful gains. For guidelines for good or meaningful gains for statewide standards-based tests, Bob Linn commented about 10 years ago that consistent gains in the 3 to 4 percent range over time should be interpreted as good meaningful gains; I agree with Bob’s observation. For darn near 40 years, I’ve suggested to schools [not districts or states with much larger sample sizes] that gains of 10 percent should be interpreted as meaningful and that frequently it takes several years for a school to establish that kind of growth across grades and content areas. With the national NAEP, given the size of the population it is designed to track, gains in the 4-point range (reasonable target for statewide results) or in the 10-point range (reasonable target for school level results) are too large, in my judgment. TUDA NAEP gains in the 3-4 point range for the large urban districts from year-to-year qualify as “notable” gains, again in my judgment. Statistically significant gains would be less than these “educationally meaningful” guidelines.
    - navigio 10 years ago10 years ago
      
      I dont think it's merely a judgement call whether a given point gain is 'notable'. If the gain is not statistically significant in the first place, then its equivalent to no gain whatsoever. Treating zero gain as notable makes no sense. It's true that once something is considered statistically significant, it is a judgement call as to what you do with that information, but I think thats something other than what is being discussed. FWIW, according … Read More
      I dont think it’s merely a judgement call whether a given point gain is ‘notable’. If the gain is not statistically significant in the first place, then its equivalent to no gain whatsoever. Treating zero gain as notable makes no sense. It’s true that once something is considered statistically significant, it is a judgement call as to what you do with that information, but I think thats something other than what is being discussed.
      
      FWIW, according to the study, 10-year changes of less than about 10 points in math and about 8 points in reading were not considered statistically significant. For LAUSD, all-student changes were considered statistically significant in all tests except 8th grade math (mentioned in the story).
      
      Its probably also important to mention that very few of the district-level subgroup results were considered statistically significant when compared to 2011.
      
      And regarding the notion of ‘notable’. I did find it notable that the fresno super indicated he put more stock into these results than those we get from our state accountability metrics. It would be very interesting to hear why he thinks that and what he thinks we should change.
      - Doug McRae 10 years ago10 years ago
        
        Navigio: OK, if a gain isn't stat significant, I'd concede it isn't a judgment call. But for educational testing data, the stat significance calculations assume that kids are randomly assigned to states or schools or subgroups or whatever, and we know that ain't the case. So, the assumption of random assignment translates to unrealistically low gain or growth numbers being stat significant, by and large, and results in softer judgmental higher numbers … Read More
        Navigio: OK, if a gain isn’t stat significant, I’d concede it isn’t a judgment call. But for educational testing data, the stat significance calculations assume that kids are randomly assigned to states or schools or subgroups or whatever, and we know that ain’t the case. So, the assumption of random assignment translates to unrealistically low gain or growth numbers being stat significant, by and large, and results in softer judgmental higher numbers before folks should treat gains as noteworthy or notable. It is for this reason that interpretation of test result gains is for the most part a judgmental rather than scientific or statistical thingie.
        
        Manuel 10 years ago10 years ago
        
        Bottom line: any "pronouncements" about test results are "judgement calls" and based on the experience and/or bias of the source. So how can policy (such as "we want better outcomes with LCFF money") be based on such ethereal conceits? If there is no science or, the Goddess forbid, statistics involved, how can there be any "authority" to base policy on something that not even Deasy can put his finger on other than to claim credit for? Even … Read More
        Bottom line: any “pronouncements” about test results are “judgement calls” and based on the experience and/or bias of the source.
        
        So how can policy (such as “we want better outcomes with LCFF money”) be based on such ethereal conceits? If there is no science or, the Goddess forbid, statistics involved, how can there be any “authority” to base policy on something that not even Deasy can put his finger on other than to claim credit for?
        
        Even the LA Times editorial board is starting to wonder about this if one can believe their editorial on the TUDA results. Here are the last three paragraphs from this editorial:
        
        “Researchers say it’s impossible to ferret out the reasons because the implementation of school reforms tends be haphazard, overly broad and seldom assessed. The higher scores seem to indicate, as reformers have claimed, that smaller class sizes don’t necessarily matter much; class sizes increased during the last few years because of the state’s budget crisis even as the test scores went up. At the same time, scores rose without the change sought by Supt. John Deasy and other reformers that would tie teachers’ performance ratings to their students’ test scores. Apparently, teachers are successfully improving scores without that kind of pressure.
        
        The higher test scores might reflect policies from years ago that are only now starting to show results. Or some factors might not even be related to changes at schools at all, said UC Berkeley education professor Bruce Fuller. Education levels among Latina mothers have been rising, and maternal education has long been considered an important factor in early literacy.
        
        With hundreds of millions of dollars coming to L.A. Unified from an improved state budget and a new school funding formula, it’s more important than ever for the district to use the money in targeted ways that can be measured and then copied if they’re successful. Future progress depends on knowing what works.”
        
        Notice that while they call for finding out what works they don’t demand that this answer be found before spending all the “new” money. Judgement call, indeed.
Manuel 10 years ago10 years ago
An increase of 4 points from a baseline of 201/246 at LAUSD is considered “notable progress” by Duncan? Really?

That’s a 2/1.6% increase.

“Notable progress?”

Maybe in an alternate universe.

(Please forgive my inability to see clothes on this very naked emperor.)
navigio 10 years ago10 years ago
Does this mean CORE doesn’t need the waiver anymore?
Replies
- John Fensterwald 10 years ago10 years ago
  NAEP scores weren’t one of the metrics that are tied to the waiver, navigio, but I imagine that the encouraging results won’t hurt when Arne Duncan decides whether to grant the waiver for another year next summer.
  - navigio 10 years ago10 years ago
    Nice redirect John. ;-) To the extent the results reflect anything, they do not reflect any of the changes that the waiver allowed. On the contrary, they indicate that whatever was happening prior to the waiver was actually working (again, subject to interpretation). Some of the proposed changes, including cutting SES intervention, will result a in a real and significant change for students (though based on earlier comments at least Fresno seems to recognize the danger … Read More
    Nice redirect John. 😉
    
    To the extent the results reflect anything, they do not reflect any of the changes that the waiver allowed. On the contrary, they indicate that whatever was happening prior to the waiver was actually working (again, subject to interpretation). Some of the proposed changes, including cutting SES intervention, will result a in a real and significant change for students (though based on earlier comments at least Fresno seems to recognize the danger in that).
    
    If these results are not matched in future assessments the waiver will be interpreted as having caused that failure given that it’s freedom only happened after these tests were taken.
    
    It is quite noteworthy that Deasy was so surprised by this given how he characterized last years performance gains. I’m sorry to say that that heightens my suspicion that he actually does not believe assessment results can be tied in a causal way to district policies.
    - Kathy Baron 10 years ago10 years ago
      
      Navigio, Please let me clarify John Deasy’s remarks. He was more elated than surprised. He expected the district to do well, but the gains exceeded those expectations.

State’s largest urban districts post gains on national assessment

Kathryn Baron

December 18, 2013

18 Comments

Gains beat nation

‘Good day’ in Fresno

Racial gaps remain

Comments (18)

Leave a Comment

Your email address will not be published. Required fields are marked * *

Comments Policy

Manuel 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

Manuel 10 years ago10 years ago

Gary Ravani 10 years ago10 years ago

Kathy Baron 10 years ago10 years ago

navigio 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

navigio 10 years ago10 years ago

Manuel 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

navigio 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

Manuel 10 years ago10 years ago

Manuel 10 years ago10 years ago

navigio 10 years ago10 years ago

John Fensterwald 10 years ago10 years ago

navigio 10 years ago10 years ago

Kathy Baron 10 years ago10 years ago

EdSource Special Reports

California moves a step closer to eliminating one of the state’s last teacher assessments

Community college faculty should all be allowed to work full time

Bill to mandate ‘science of reading’ in California classrooms dies

Interactive Map: Chronic absenteeism up in nearly a third of 930 California districts

State’s largest urban districts post gains on national assessment

Kathryn Baron

December 18, 2013

18 Comments

Gains beat nation

‘Good day’ in Fresno

Racial gaps remain

Going deeper

Comments (18)

Leave a Comment

Your email address will not be published. Required fields are marked * *

Comments Policy

Manuel 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

Manuel 10 years ago10 years ago

Gary Ravani 10 years ago10 years ago

Kathy Baron 10 years ago10 years ago

navigio 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

navigio 10 years ago10 years ago

Manuel 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

navigio 10 years ago10 years ago

Doug McRae 10 years ago10 years ago

Manuel 10 years ago10 years ago

Manuel 10 years ago10 years ago

navigio 10 years ago10 years ago

John Fensterwald 10 years ago10 years ago

navigio 10 years ago10 years ago

Kathy Baron 10 years ago10 years ago

EdSource Special Reports

California moves a step closer to eliminating one of the state’s last teacher assessments

Community college faculty should all be allowed to work full time

Bill to mandate ‘science of reading’ in California classrooms dies

Interactive Map: Chronic absenteeism up in nearly a third of 930 California districts

EdSource in your inbox!

Stay informed with our daily newsletter