Laurie Udesky/EdSource Today

Students taking Smarter Balanced practice tests at Bayshore Elementary School in Daly City.

Smarter Balanced tests administered in California and other states are well-aligned to Common Core standards in math and English language arts, but could be improved, according to two new studies.

A study of high school tests ranked Smarter Balanced higher than the three other standardized tests evaluated, but the other study of 5th- and 8th-grade tests identified weaknesses that Smarter Balanced officials plan to address in response to the report.

Both studies, released Thursday, compared Smarter Balanced tests to those created by the Partnership for Assessment of Readiness for College and Careers, or PARCC; ACT Aspire; and the Massachusetts Comprehensive Assessment System, or MCAS. Like Smarter Balanced, PARCC tests were developed specifically to measure student achievement according to Common Core standards. The ACT Aspire test, administered in multiple states, also measures achievement based on Common Core standards, while the “MCAS was included because it represents what has been considered ‘best in class’ for individual state assessments up until this point,” according to the Human Resources Research Organization, or HumRRO, which conducted the study of high school tests.

Smarter Balanced tests that are given to 11th-graders in California are used by California State University and nearly 80 community colleges statewide to measure student readiness in math and English language arts.

“In the new era created by the federal Every Student Succeeds Act signed into law in December 2015, states have the responsibility to ensure their educational systems produce students who are prepared for the worlds of higher education and work,” HumRRO’s executive summary states. “Student assessments are the primary mechanism for gauging the success of state and local educational systems.”

The Fordham Institute’s study, “Evaluating the Content and Quality of Next Generation Assessments,” looked at tests given to 5th- and 8th-graders.

The reports were both based on “Criteria for Procuring and Evaluating High-Quality Assessments,” released in 2014 by the Council of Chief State School Officers, a nonpartisan nonprofit comprised of education leaders from every state in the country.

Experts looked at test questions and answers, along with scores, and rated the tests based on math and English language arts content, as well as the depth of questions. Accessibility for students with special needs and English learners was also rated.

In both reports, reviewers’ scores were aggregated to assign one of four ratings in each sub-category, based on how well the tests matched the criteria: Excellent Match, Good Match, Limited/Uneven Match, or Weak Match. Overall scores were not given and the reports did not study the tests’ reliability or validity, meaning their consistency or ability to measure what they claimed to measure.

HumRRO’s report gave Smarter Balanced’s high school exams Excellent Match or Good Match ratings in all areas. The Fordham report, on the other hand, gave Smarter Balanced high marks in all areas except “high quality” questions and “variety” of question types in math, for which it received Limited/Uneven Match.

The report suggested that Smarter Balanced could strengthen its tests by removing “serious mathematical and/or editorial flaws” found in some questions. During a panel discussion Thursday about the report, Morgan Polikoff, assistant professor of education in the Rossier School of Education at the University of Southern California, said some reviewers found math questions in Smarter Balanced tests that could have more than one correct answer.

The Fordham report said, “The program could better meet the depth criteria by ensuring that all items meet high editorial and technical standards and that a given student is not presented with two or more virtually identical problems.”

In a written response to the Fordham study, Smarter Balanced Executive Director Anthony Alpert said the report did not adequately evaluate the “computer-adaptive” nature of the tests, which customize test questions for each student based on their answers to previous questions.

Alpert also described the rigorous process Smarter Balanced uses to create “only high-quality questions on our tests,” but promised to use the study to improve its process for test question development.

“Immediately,” he said, “Smarter Balanced will initiate a detailed review of the existing test questions based on the feedback from this report.”

Although Smarter Balanced received mostly Excellent Match ratings for its high school test, it was rated Good Match in many of the Fordham report categories, including English language arts vocabulary and language skills, and in the majority of subcategories under “depth.”

“The program is most successful in assessment of writing and research and inquiry,” the report said.

It praised Smarter Balanced for assessing listening skills, which none of the other test companies did. However, the report suggested that Smarter Balanced could improve its language arts tests by strengthening its vocabulary questions, increasing the difficulty level of 5th-grade questions and, “over time, developing capacity to assess speaking skills.”

Alpert responded: “With English language arts, we will discuss the report’s findings with our membership and consider changes.”

His response to the more favorable HumRRO report touted the high ratings Smarter Balanced received as evidence of the tests’ alignment to Common Core standards, and their ability to assess college and career readiness.

“As educators become more knowledgeable about Smarter Balanced,” Alpert wrote, “we are confident our assessment system will continue to be recognized as a historic and groundbreaking system to improve teaching and learning.”

Although some states have decided to use SAT or ACT tests to assess college and career readiness, the HumRRO report did not evaluate those.

“ACT Aspire, PARCC, and Smarter Balanced are administered to about 40 percent of students in grades 3-11 nationwide,” said Suzanne Tsacoumis, HumRRO vice president, in an e-mail. “When this study was undertaken, both SAT and ACT were undergoing major overhauls of their tests that wouldn’t be ready for several months. A review of the soon-to-be-outdated tests wouldn’t be useful. Of course, now that they’re available, we encourage them to subject their tests to external review.”

Tsacoumis said the one of the main reasons the studies were conducted was that students often pass their states’ standardized tests, yet are unprepared for college courses and need remediation. This has resulted in assumptions by some that the “tests are not of high quality,” she said.

The goal of the studies was to assess the quality of the tests, based on Common Core standards, in order to help states make decisions about what tests to use in the future, according to the reports.



Leave a Comment

Your email address will not be published. Required fields are marked *

Comments Policy

The goal of the comments section on EdSource is to facilitate thoughtful conversation about content published on our website. Click here for EdSource's Comments Policy.

Expand Comments
Collapse Comments
  1. Sandra Stotsky 9 months ago9 months ago

    Reporters should take a look at the critique by a testing expert of Fordham’s pretend research to better understand why the use of evaluation criteria designed by Common Core advocates on Common Core-aligned tests unsurprisingly finds such tests better than those that weren’t. Duh!


    • Don 8 months ago8 months ago

      Thank you. Dr. Stotsky, for providing some context to adequately understand the value (or lack thereof) of the study referenced in this article. I have taken the liberty to copy below a small portion of the critique you linked above. 4. When not dismissing or denigrating SBAC and PARCC critiques, the Fordham report evades them, even suggesting that critics should not read it: “If you don’t care for the standards…you should probably ignore this study” … Read More

      Thank you. Dr. Stotsky, for providing some context to adequately understand the value (or lack thereof) of the study referenced in this article.

      I have taken the liberty to copy below a small portion of the critique you linked above.

      4. When not dismissing or denigrating SBAC and PARCC critiques, the Fordham report evades them, even suggesting that critics should not read it: “If you don’t care for the standards…you should probably ignore this study” (p. 4).

      Yet, cynically, in the very first paragraph the authors invoke the name of Sandy Stotsky, one of their most prominent adversaries, and a scholar of curriculum and instruction so widely respected she could easily have gotten wealthy had she chosen to succumb to the financial temptation of the Common Core’s profligacy as so many others have. Stotsky authored the Fordham Institute’s “very first study” in 1997, apparently. Presumably, the authors of this report drop her name to suggest that they are broad-minded. (It might also suggest that they are now willing to publish anything for a price.)

      Tellingly, one will find Stotsky’s name nowhere after the first paragraph. None of her (or anyone else’s) many devastating critiques of the Common Core tests is either mentioned or referenced. Genuine research does not hide or dismiss its critiques; it addresses them.

      Ironically, the authors write, “A discussion of [test] qualities, and the types of trade-offs involved in obtaining them, are precisely the kinds of conversations that merit honest debate.” Indeed.

Template last modified: