As millions of students prepare, for the first time, to take a battery of assessments aligned with the Common Core using computers, at least portions of the tests will have to be scored the old-fashioned way: by humans.
That’s because the so-called Smarter Balanced tests, aligned with the Common Core State Standards, include essay questions designed to measure critical thinking skills. Even the math tests require students to explain how they reach their answers.
And unlike the old multiple-choice California Standards Tests that students took every year until the spring of 2013, those more complex portions of the Smarter Balanced tests can’t be easily scored by machine.
To score them, the Educational Testing Service,* which will administer the tests under contract with the California Department of Education, is in the process of hiring 6,500 scorers in California. It has almost reached its goal. As of Feb. 19 it has recruited 6,294 people to work as hand scorers, pending their passing certification. Of those, 3,777 have passed certification, according to the California Department of Education.
“Because the test is so new, I wanted to see exactly what they’re looking for when they’re assessing students,” said Christopher Vue, a math teacher at Washington Union High School in Easton.
The use of people rather than computers to rate essay questions has been routine for portions of tests like the Graduate Record Exams and the Graduate Management Admissions Test. But they will be used to score many more answers than on previous tests taken by K-12 students in California.
While human scoring has advantages over automated computer scoring on essay-type questions, it also has its disadvantages. How well the testing service recruits and trains scorers could have an impact on individual students’ scores.
A 2013 paper published by the Educational Testing Service noted that “humans can make mistakes due to cognitive limitations that can be difficult or even impossible to quantify, which in turn can add systematic biases to the final scores.” That’s on top of the logistics of managing and training – and paying – thousands of scorers, a process that the testing service paper described as “labor intensive, time consuming and expensive.”
According to an Educational Testing Service recruiting flyer, to become a test “rater” – the term used in testing parlance – a bachelor’s degree in any field is required, although teaching experience is “strongly preferred.”
Among those who have been hired so far, only 241 are current California teachers.
One of them is Christopher Vue, a math teacher at Washington Union High School in Easton, near Fresno. To get certified as a rater, he recently sat down in front of his home computer to figure out the best way to grade the critical thinking skills of students he’s never met. The test results he was asked to score were those of students who took the Smarter Balanced field tests administered last spring.
On one middle school math problem on “proportional relationships” that Vue was asked to score, a student’s response could earn up to three points. A clear set of guidelines provided by the testing service helped him figure out how to score five possible responses to the same problem, he said.
It took Vue two hours to read and review all of the material for the training and certification. When he begins scoring students this spring, he said that he will be able to ask a team leader if he is unsure about how to score a particular answer.
While Vue will earn $13 an hour for working as a scorer, his reason for signing up was not the extra cash. “Because the test is so new, I wanted to see exactly what they’re looking for when they’re assessing students,” Vue said.
Natalie Albrizzio, a math specialist in the Ventura Unified School District, had a similar motivation for becoming a test scorer. But she has already had experience with the process.
When California adopted the Common Core Standards in 2010, her school district created its own math exams that required students to explain the reasoning behind their answers. To determine how to score those exams on a scale of 0 to 3, she said, teachers discussed all the possible responses to the questions. For example, she said, they pondered whether to give a student whose answer was nonsensical a 1 for effort.
The funds to pay for hand scorers will come out of a $24 million budget that’s set aside for test processing, scoring and analysis of the Smarter Balanced assessments, according to officials at the California Department of Education.
California may have used hand scorers in a more limited way on statewide K-12 assessments, but the scope of their use has been widespread for decades in other states, including Washington and Connecticut, according to Shelbi Cole, the deputy director of content for the Smarter Balanced Assessment Consortium.
She said that the math and English Language Arts scoring guidelines that Smarter Balanced has distributed to California and other states using its assessments were developed by educators at meetings where they considered numerous answers students might provide to the test questions.
What kinds of responses earn a high score depends on the complexity or difficulty of a test item. For example, to earn a top score of 4 on an essay in which a student argues that the British Museum should return the Rosetta Stone to Egypt requires clear sourcing and citations, the use of expert opinions to rebut opposing views, and the appropriate use of vocabulary.
The Educational Testing Service is looking into ways more of the Common Core tests can be scored without human intervention, and experts in the testing field believe that more machine testing is inevitable. The 2013 testing service report concluded that “advances in artificial intelligence technologies have made machine scoring of essays a realistic option… and that it will be used more widely in educational assessments in the near future.”
But for now, scorers like Vue and Albrizzio will be essential to the process.
Administration of the Smarter Balanced assessments will begin in some districts at the end of March, and testing will run into June, depending on the district. Vue is now waiting for instructions about what to do next, including being told which grade levels he’ll be expected to score. “I feel like I’m really in the dark about what’s going to happen next,” he said.
Like Vue, Albrizzio is looking forward to getting a closer look at the tests themselves. “Especially for math, I think it’s important for teachers to participate,” she said.
*Correction: An earlier version of this article incorrectly stated the name of the Educational Testing Service. The story was also updated to reflect the extent to which California is using hand scorers for K-12 assessments.