A study of high school tests ranked Smarter Balanced higher than the three other standardized tests evaluated, but the other study of 5th- and 8th-grade tests identified weaknesses that Smarter Balanced officials plan to address in response to the report.
Both studies, released Thursday, compared Smarter Balanced tests to those created by the Partnership for Assessment of Readiness for College and Careers, or PARCC; ACT Aspire; and the Massachusetts Comprehensive Assessment System, or MCAS. Like Smarter Balanced, PARCC tests were developed specifically to measure student achievement according to Common Core standards. The ACT Aspire test, administered in multiple states, also measures achievement based on Common Core standards, while the “MCAS was included because it represents what has been considered ‘best in class’ for individual state assessments up until this point,” according to the Human Resources Research Organization, or HumRRO, which conducted the study of high school tests.
Smarter Balanced tests that are given to 11th-graders in California are used by California State University and nearly 80 community colleges statewide to measure student readiness in math and English language arts.
“In the new era created by the federal Every Student Succeeds Act signed into law in December 2015, states have the responsibility to ensure their educational systems produce students who are prepared for the worlds of higher education and work,” HumRRO’s executive summary states. “Student assessments are the primary mechanism for gauging the success of state and local educational systems.”
The Fordham Institute’s study, “Evaluating the Content and Quality of Next Generation Assessments,” looked at tests given to 5th- and 8th-graders.
The reports were both based on “Criteria for Procuring and Evaluating High-Quality Assessments,” released in 2014 by the Council of Chief State School Officers, a nonpartisan nonprofit comprised of education leaders from every state in the country.
Experts looked at test questions and answers, along with scores, and rated the tests based on math and English language arts content, as well as the depth of questions. Accessibility for students with special needs and English learners was also rated.
In both reports, reviewers’ scores were aggregated to assign one of four ratings in each sub-category, based on how well the tests matched the criteria: Excellent Match, Good Match, Limited/Uneven Match, or Weak Match. Overall scores were not given and the reports did not study the tests’ reliability or validity, meaning their consistency or ability to measure what they claimed to measure.
HumRRO’s report gave Smarter Balanced’s high school exams Excellent Match or Good Match ratings in all areas. The Fordham report, on the other hand, gave Smarter Balanced high marks in all areas except “high quality” questions and “variety” of question types in math, for which it received Limited/Uneven Match.
The report suggested that Smarter Balanced could strengthen its tests by removing “serious mathematical and/or editorial flaws” found in some questions. During a panel discussion Thursday about the report, Morgan Polikoff, assistant professor of education in the Rossier School of Education at the University of Southern California, said some reviewers found math questions in Smarter Balanced tests that could have more than one correct answer.
The Fordham report said, “The program could better meet the depth criteria by ensuring that all items meet high editorial and technical standards and that a given student is not presented with two or more virtually identical problems.”
In a written response to the Fordham study, Smarter Balanced Executive Director Anthony Alpert said the report did not adequately evaluate the “computer-adaptive” nature of the tests, which customize test questions for each student based on their answers to previous questions.
Alpert also described the rigorous process Smarter Balanced uses to create “only high-quality questions on our tests,” but promised to use the study to improve its process for test question development.
“Immediately,” he said, “Smarter Balanced will initiate a detailed review of the existing test questions based on the feedback from this report.”
Although Smarter Balanced received mostly Excellent Match ratings for its high school test, it was rated Good Match in many of the Fordham report categories, including English language arts vocabulary and language skills, and in the majority of subcategories under “depth.”
“The program is most successful in assessment of writing and research and inquiry,” the report said.
It praised Smarter Balanced for assessing listening skills, which none of the other test companies did. However, the report suggested that Smarter Balanced could improve its language arts tests by strengthening its vocabulary questions, increasing the difficulty level of 5th-grade questions and, “over time, developing capacity to assess speaking skills.”
Alpert responded: “With English language arts, we will discuss the report’s findings with our membership and consider changes.”
His response to the more favorable HumRRO report touted the high ratings Smarter Balanced received as evidence of the tests’ alignment to Common Core standards, and their ability to assess college and career readiness.
“As educators become more knowledgeable about Smarter Balanced,” Alpert wrote, “we are confident our assessment system will continue to be recognized as a historic and groundbreaking system to improve teaching and learning.”
Although some states have decided to use SAT or ACT tests to assess college and career readiness, the HumRRO report did not evaluate those.
“ACT Aspire, PARCC, and Smarter Balanced are administered to about 40 percent of students in grades 3-11 nationwide,” said Suzanne Tsacoumis, HumRRO vice president, in an e-mail. “When this study was undertaken, both SAT and ACT were undergoing major overhauls of their tests that wouldn’t be ready for several months. A review of the soon-to-be-outdated tests wouldn’t be useful. Of course, now that they’re available, we encourage them to subject their tests to external review.”
Tsacoumis said the one of the main reasons the studies were conducted was that students often pass their states’ standardized tests, yet are unprepared for college courses and need remediation. This has resulted in assumptions by some that the “tests are not of high quality,” she said.
The goal of the studies was to assess the quality of the tests, based on Common Core standards, in order to help states make decisions about what tests to use in the future, according to the reports.