The students in John Daniels’ U.S. history class at James Lick High School in East San Jose are a smidgen of the tens of thousands of juniors who are taking the Smarter Balanced Assessment Consortium field test this spring. And their views of the new test on the Common Core State Standards are but a snapshot of many that the creators of the test and the state Department of Education will receive over the next two months.

But what they said last week, representative or not, would probably please the creators of the new assessment. As Glenn VanderZee, James Lick’s principal, observed, most of them “got it.”

Not necessarily the answers. Neither James Lick administrators nor the students will know how they did; as with all students in grades 3 through 8 and 11 in California and elsewhere, their tests won’t get scored. The purpose of the field test is to inform Smarter Balanced, a consortium of states that includes California, about the validity of the 20,000 questions that will be vetted and aspects of the computer-based technology that need tweaking.

In a classroom discussion and follow-up interviews, the James Lick students said they understood the nature of the new assessment – how it’s different from the California Standards Tests that they grew up taking – and why the new tests might be an improvement.

Desiree Jones Credit: John Fensterwald

Desiree Jones Credit: John Fensterwald

“With this test, you had to make your point and explain your answer,” said Desiree Jones. “In the future, you may have to do the same thing – back up your claim –where you work. You can’t just say, ‘That’s good.’ You’ll need to say what you think and why.”

Citing evidence, defending a position

Desiree was referring to the performance assessment part of the test. It represents the biggest change from the state tests. Students were given four articles about a contemporary subject they could relate to. (EdSource agreed, as a condition of speaking to the students, not to discuss any specific questions on the field test.) They were asked to take a position, using evidence based on what they read. They could use a split screen to cut and paste from the articles – a task that some students found difficult to do, especially for math problems, using their portable Chromebooks  – and they could write as much and take as much time as they wanted.

“With this test,” Desiree said, “you had to put down reasons you chose a specific answer – not just fill in a bubble.”

“People want you to lead in the future,” with an ability to think for yourself, said Jazmine De La Cruz.

Teaching students to think critically is a principal goal of the Common Core standards. High school English Language Arts standards emphasize learning how to analyze informational texts. Math standards stress understanding the concepts behind the formulas. The Smarter Balanced tests reinforce these broader objectives. Several of the math questions asked students not just to give the right answer but also to explain their work. The reading questions required typing short answers.

Test prep in the past included a strategy for making an educated guess on multiple choice questions by eliminating answers that clearly didn’t make sense, raising the odds of filling in the right answer. Demanding short answers to questions forces students to read passages and do the math – not blow past with random answers.

Thumbs up on online test – with some reservations

Students said there were annoying aspects to doing a test on a computer, but overall they said they preferred it. They said it was cumbersome to type out a formula; they complained there was no scratch paper to solve math problems (actually, scratch paper is allowed, but a proctor on the first day misread the rules).

Cyril Garcia said that an online test should use touchscreen technology. This was a first generation test – “not really thought out,” he said. But Javier Cruz said that jobs in the future will demand more technology, so it’s important to prepare students for that with online tests.

Jesus Vargas. Credit: John Fensterwald

Jesus Vargas. Credit: John Fensterwald

Desiree said she found online tests neither better nor worse, just different. At least initially, until the tests become routine, students will find that interesting, she said.

Jesus Vargas said with computer-based tests, students should get the results faster (that is Smarter Balanced’s intent). Results from state standardized tests were returned the following fall, too late to be of any use to students who wanted to know which areas they needed to improve, Jesus said.

Juniors at James Lick, as at most high schools, take a range of math courses reflecting their abilities and interest: Algebra II, Geometry, Calculus or nothing at all, since only two years of math are required to graduate.

Some of the students found the math section frustrating, since it included questions on a mix of disciplines – some hard, some easy and in no particular order.

“Geometry concepts are hard to remember,” said Daisy De La Cruz, who is now taking Calculus.

Desiree said, “In the past, questions went gradually from easy to hard. This one was jumbled.”

Field tests are designed to test the validity of questions, not simulate actual tests that students will take starting next year. As a result, there was an intentional randomness in the question selection and order that caught students by surprise. Questions ranged from pre-algebra they took in middle school to graphing problems in pre-calculus, students said.

Next year, that will change. Smarter Balanced is promising an adaptive assessment, an individualized test with questions based on a student’s correct or incorrect answers to previous questions. It will be an integrated exam, measuring a range of knowledge, said Deb Sigman, state deputy superintendent of public instruction and co-chair of Smarter Balanced Assessment Consortium’s executive board. It will count not only as a school accountability tool but also as a measure of an individual student’s readiness for entry-level, for-credit  college courses, she said. That will be an incentive for students to take it seriously.

Next year, all schools in the East Side Union High School District will switch from subject-specific math courses – Algebra, Geometry, Pre-Calculus – to Integrated Math, an option under Common Core. Integrated Math combines elements of algebra, geometry and statistics in a sequence of three increasingly challenging courses. So each subject should be fresh in students’ minds when they take the 11th grade Smarter Balanced math test, VanderZee said.

Way to reach disconnected students

James Lick has an overall Academic Performance Index of 674, nearly half its students are English learners and it ranks in the bottom fifth of scores on standardized tests. Daniels said he is “a fan of the Common Core for our population” and believes it is a way to engage students bored by traditional approaches to American history.

“Some kids in other demographics get skills, like critical thinking, outside of schools,” Daniels said. “We have to do it here so that students can learn to become problem solvers.”

Glenn VanderZee, principal of James Lick High. Credit: John Fensterwald.

Glenn VanderZee, principal of James Lick High. Credit: John Fensterwald.

VanderZee also is confident that Common Core standards will be a way to reach students who are disconnected from school. His observations after two years of watching students take the Smarter Balanced practice tests reaffirm his support. Paradoxically, some of his lowest-performing students seemed the most interested in the new test. They had the least to lose, he said, because in the past they didn’t have the skills to answer test questions.

“The previous state test took the approach, There is a correct answer, can you find it? This test focuses on them – their ability to come up with a response and defend it. It’s how we engage learners: What is your take?”

“Who complains about testing changes? The ones who did well on the previous test,” VanderZee said. “They did well on state tests as a point of pride. Now they have the most to lose, in terms of changes, and are reacting by saying, ‘You turned the rules on us.’”

John Fensterwald covers education policy. Contact him and follow him on Twitter @jfenster. Sign up here for a no-cost online subscription to EdSource Today for reports from the largest education reporting team in California.

To get more reports like this one, click here to sign up for EdSource’s no-cost daily email on latest developments in education.

Share Article

Comments (17)

Leave a Reply to Paul Muench

Your email address will not be published. Required fields are marked * *

Comments Policy

We welcome your comments. All comments are moderated for civility, relevance and other considerations. Click here for EdSource's Comments Policy.

  1. GREGG AUSTIN 10 years ago10 years ago

    “Who complains about testing changes? The ones who did well on the previous test,” VanderZee said. “They did well on state tests as a point of pride. Now they have the most to lose, in terms of changes, and are reacting by saying, ‘You turned the rules on us.’” Who celebrates standardized testing? In part, you’re right, there is a history of those who did well on the SAT to, perhaps, crow a bit. … Read More

    “Who complains about testing changes? The ones who did well on the previous test,” VanderZee said. “They did well on state tests as a point of pride. Now they have the most to lose, in terms of changes, and are reacting by saying, ‘You turned the rules on us.’”

    Who celebrates standardized testing? In part, you’re right, there is a history of those who did well on the SAT to, perhaps, crow a bit. Certainly, those who did not do as well did not offer-up comparisons or dinner conversation about the SAT, or its kind.

    But, why are politicians and edu-bureaucrats celebrating standardized testing? Is it because of the desire and political expediency in making the complex and difficult simple? Sound-bites are always a favorite of politics.

    Most disheartening about this principal’s comment, however, is his simplistic view of human behavior. As a classroom teacher, it is even more disturbing that this principal, so intimate is his access to students and his charge to educate and safeguard them, is that education boils down, so simply, to testing.

    This article, probably not intentionally, details everything wrong with the current education regime. The notion that standardized testing is a good thing and the more of it the better is just frightening. The idea that the Common Core “engages students bored by traditional approaches to American history” is equally appalling. Why does a teacher need a list of non-content related skills to make history engaging? Common Core has nothing to do with engagement and only dictates what a student can do—the student doesn’t need to know anything.

    The Common Core and its affiliated testing systems are perfect for producing non-knowing, non-thinking, and non-questioning minions for the wealthy and politically connected.

  2. GREGG AUSTIN 10 years ago10 years ago

    “The previous state test took the approach, There is a correct answer, can you find it? This test focuses on them – their ability to come up with a response and defend it. It’s how we engage learners: What is your take?” …really? If the student was defending a project then I would agree with this argument. Only then would a student be able to demonstrated competence and mastery. In fact, there are … Read More

    “The previous state test took the approach, There is a correct answer, can you find it? This test focuses on them – their ability to come up with a response and defend it. It’s how we engage learners: What is your take?”

    …really? If the student was defending a project then I would agree with this argument. Only then would a student be able to demonstrated competence and mastery. In fact, there are many schools (i.e., Big Picture) that manage to develop learned students who never take these antiquated (regardless of the name their given, i.e., Smarter Balanced) systems of testing—their students even attend well regarded universities because their high school experience was so exceptional, and no qualms about not taking IB, AP, SATs, or ACTs.

    Moreover, project based assessment naturally synthesizes qualitative and quantitative cognitive skills for which students actually have to have the correct answer. One of the biggest problems with all the Common Core associated products/tests is that their religion of relativeness preaches and label wrong answers (mathematical and otherwise factual) as correct. The Common Core cheerleaders promote the idea that “a response and defense” is equal to correct, for testing purposes. One only need to look in a national publisher’s mathematics workbook to note that students are to “find a reasonable answer.” Wow! This is not my ideas of higher standards.

    Also, why does the Common Core bandwagon keep arguing that “[standardized] tests teach” or that “[standardized tests] engage learners? This is the height of desperation. If one wishes to teach and engage students then challenge them to synthesize knowledge and create something new—now, that is a test.

    Sitting in a room, in an artificial environment, signing legal documents, making an oath to the government, being threatened by the law is not what I call engaging or valuable or beneficial or productive or wise.

  3. Floyd Thursby 10 years ago10 years ago

    I worry that the kids with the best computers will get the best scores. It could lead to class bias. Just look at a lot of web sites, from one computer it looks great, from another, not so much, and timing is key to these tests. This is why the paper and pencil method should have been retained until every child has a fast, functioning computer. I'd hate to be the … Read More

    I worry that the kids with the best computers will get the best scores. It could lead to class bias. Just look at a lot of web sites, from one computer it looks great, from another, not so much, and timing is key to these tests. This is why the paper and pencil method should have been retained until every child has a fast, functioning computer. I’d hate to be the kid that didn’t get into a dream school due to a bad computer.

    Replies

    • Matthew Hall 10 years ago10 years ago

      The test is web-based so even a very basic web-capable computer works just fine. I administered this test at another East Side high school using budget Chromebooks…no problems. FYI, in our district the schools with the most impoverished student populations have the best technology owing to funding that privileges under-privileged communities.

  4. Jackie Berman 10 years ago10 years ago

    I’m confused. The article talks about CC test in a History class. I thought that the History/Social test has not been developed, and will not be for several years. (???)

    Replies

    • John Fensterwald 10 years ago10 years ago

      Jackie: This just happened to be a class where students were available to talk about the English language arts and math tests in the Common Core standards that they had taken. California has its own history standards that it does not intend to revise. It does plan to create new computer-based history tests in those standards, but it’s far down the line of priorities; it could be at least several years before they are ready.

      • Gary Ravani 10 years ago10 years ago

        Ah, yes. The CA State Content Standards for "History." Why would there ever be any effort to revise those? Having taught 7th grade history about every other year for the last 20 of my 35 years in the classroom one of my favorite standards was Standard 7.6 (8): "1. Understand the importance of the Catholic church as a political, intellectual, and aesthetic institution (e.g., founding of universities, political and spiritual roles of the clergy, creation of monastic … Read More

        Ah, yes. The CA State Content Standards for “History.” Why would there ever be any effort to revise those? Having taught 7th grade history about every other year for the last 20 of my 35 years in the classroom one of my favorite standards was Standard 7.6 (8):

        “1. Understand the importance of the Catholic church as a political, intellectual, and aesthetic institution (e.g., founding of universities, political and spiritual roles of the clergy, creation of monastic and mendicant religious orders, preservation of the Latin language and religious texts, St. Thomas Aquinas’s synthesis of classical philosophy with Christian theology, and the concept of “natural law”). ”

        This jewel comes along with a couple of dozen other like standards. You could begin with a concentration on the meaning and application of: “synthesis,” then “classical,” then “philosophy,” then “theology.” Top it off with “natural law.” Then, obviously, the students would ‘synthesize” all of above.

        This was for 13 year olds.

        It’s funny (or not) that in all of years of wandering the halls of Sacramento and lobbying re education related issues I heard lots of blathering about “standards.” I never actually ran into anyone (outside of CDE) who had actually read the standards.

        • Matthew Hall 10 years ago10 years ago

          This is exactly why a shift towards deeper thinking that involves argument based on cited evidence is so important. The CCSS moves away from the absurdity of granular content (the what) and moves towards process and skill (the how).

          • GREGG AUSTIN 10 years ago10 years ago

            Actually, CCSS does not wish nor do they want students to include background knowledge. In fact, CCSS is happiest with the students knowing next to nothing... "students, just give us your best uninformed opinion of the text. And, don't worry about knowing or understanding the text. And, don't worry about "correct" answers. Just give us somewhat of a rationale as to why you wrote what you wrote--it doesn't even have to follow … Read More

            Actually, CCSS does not wish nor do they want students to include background knowledge. In fact, CCSS is happiest with the students knowing next to nothing…

            “students, just give us your best uninformed opinion of the text. And, don’t worry about knowing or understanding the text. And, don’t worry about “correct” answers. Just give us somewhat of a rationale as to why you wrote what you wrote–it doesn’t even have to follow any laws of logic. Nearly anything you put down could be considered correct. Of course, you will never know if and why we felt your answer to be sufficient–the law won’t allow you to question the validity of the results. You have no rights. So, you know, don’t sweat it. Actually, we prefer you not think for your self anyway.”

  5. Don 10 years ago10 years ago

    I'm underwhelmed at the prospect of rater quality given the challenges of human scoring. https://www.ets.org/Media/Research/pdf/RD_Connections_21.pdf Table 1: Descriptions of Some Common Human-Rater Errors and Biases Severity/ Leniency Refers to a phenomenon in which raters make judgments on a common dimension, but some raters tend to consistently give high scores (leniency) while other raters tend to consistently give low scores (severity), thereby introducing systematic biases. Scale Shrinkage Occurs when human raters don’t use the extreme categories on a … Read More

    I’m underwhelmed at the prospect of rater quality given the challenges of human scoring.

    https://www.ets.org/Media/Research/pdf/RD_Connections_21.pdf

    Table 1: Descriptions of Some Common Human-Rater Errors and Biases

    Severity/ Leniency Refers to a phenomenon in which raters make judgments on a common dimension, but some raters tend to consistently give high scores (leniency) while other raters tend to consistently give low scores (severity), thereby introducing systematic biases.

    Scale Shrinkage Occurs when human raters don’t use the extreme categories on a scale.

    Inconsistency Occurs when raters are either judging erratically, or along different dimensions, because of their different understandings and interpretations of the rubric.

    Halo Effect Occurs when the rater’s impression from one characteristic of an essay is generalized to the essay as a whole.

    Stereotyping Refers to the predetermined impression that human raters may have formed about a particular group that can influence their judgment of individuals in that group.

    Perception Difference Appears when immediately prior grading experiences influence a human rater’s current grading judgments.

    Rater Drift Refers to the tendency for individual or groups of raters to apply inconsistent scoring criteria over time.

  6. Doug McRae 10 years ago10 years ago

    Very good post on the student perspective for new computerized testing formats, John. It reveals both the potential benefits and the current hurdles inherent in a conversion from traditional paper/pencil formats to computerized testing. I might make a couple comments on several of the specifics in the post: First, the speculation that with computer-based tests "students should get results faster (that is Smarter Balanced's intent)" is not likely to be realized in the near- or medium-term future. … Read More

    Very good post on the student perspective for new computerized testing formats, John. It reveals both the potential benefits and the current hurdles inherent in a conversion from traditional paper/pencil formats to computerized testing.

    I might make a couple comments on several of the specifics in the post:

    First, the speculation that with computer-based tests “students should get results faster (that is Smarter Balanced’s intent)” is not likely to be realized in the near- or medium-term future. SB’s plan is to have roughly 70% machine-scored items, 30% human-scored items, with both needed for total scores. The human-scored items will take 4-6 weeks to be scored and merged with the machine-scored results. So, with summative tests given toward the end of the school year, the scoring delay for human-scored items will mean student results even in future years will likely be returned during the summer similar to previous p/p tests.

    Second, the description that this year’s “field tests are designed to test validity of questions” is accurate; in fact, traditionally test developers call this exercise an “item tryout” study rather than a field test because in fact it is not even a “test” in the usual use of that word. No one has seen an SB “test” yet since it has yet to be developed. In fact, the 2015 tests will be the first actual SB tests given to CA kids, and 2015 will in truth be a “benchmark” year for the SB program [terminology accurately used by SSPI Torlakson at a budget committee hearing in late Feb] rather than a first “operational” year as promoted by CDE staff and other SB advocates. Data from the benchmark year will then be used to determine scoring rules for the actual tests, and results will not be available until fall 2015. In fact, 2016 will be the first true “operational” year for SB tests in California.

  7. Paul Muench 10 years ago10 years ago

    Now its time for all the software engineers to debug the grading algorithms used on the free form answers. This is one area that’s going to need a lot more transparency if we are to believe computer systems can asses creative and critical thinking.

    Replies

    • Paul 10 years ago10 years ago

      This is a big problem, and the last time I checked, the test vendor had made the wrong choice with multiple response questions, which don't even allow free-form responses. The sample multiple response items that I tried were to have been scored on an all-or-nothing basis. This makes the scores less useful for diagnosis (the distinction between assessment and evaluation), discourages students, and reflects the same obsession with correctness over comprehension inherent in the CST … Read More

      This is a big problem, and the last time I checked, the test vendor had made the wrong choice with multiple response questions, which don’t even allow free-form responses. The sample multiple response items that I tried were to have been scored on an all-or-nothing basis. This makes the scores less useful for diagnosis (the distinction between assessment and evaluation), discourages students, and reflects the same obsession with correctness over comprehension inherent in the CST and to California’s old math standards.

      As for the scoring of free-form responses, my guess is that inter-rater reliability (perhaps the wrong term?) will matter more than response quality. As long as different scorers agree on a score, the test vendor won’t care what the score is. A useful tactic for test-takers is to enumerate, caption, or subtitle, so that the evaluator can see that every element of the question has been addressed. Raters are human, are paid by the response, and are evaluated for consistency with other raters.

      I’m grateful for CCSS and SmarterBalanced, but not hopeful that these tests will provide better diagnostic feedback to students, teachers and families.

      John and Doug – Test calibration/norm-setting situations aside, I thought that the delay in scoring the CST stemmed from the state’s decision not to pay that vendor for fast scoring. I believe that the state did finally pay for fast scoring last year or the year before — just in time for the demise of the CST. I know that big-name graduate/professional school admission tests that are offered in an adaptive CBT format return tentative scores for the multiple choice portion at the close of the testing session. Nothing prevents the state from making the same investment in K-12 students. Tentative scores available by mid-May could aid in placement and promotion decisions.

      • Doug McRae 10 years ago10 years ago

        Paul: On your Q re delay in scoring CSTs, until 2012 CA chose to calibrate previous year test forms to current year test forms based on early tests submitted for scoring, a safe more conservative way to calibrate at the expense of faster return of results (results not available until at least July). For 2013, CA changed to using prior year CST test forms such that current year calibration was not needed, thus allowing for return … Read More

        Paul:

        On your Q re delay in scoring CSTs, until 2012 CA chose to calibrate previous year test forms to current year test forms based on early tests submitted for scoring, a safe more conservative way to calibrate at the expense of faster return of results (results not available until at least July). For 2013, CA changed to using prior year CST test forms such that current year calibration was not needed, thus allowing for return of student level scores soon after answer documents were received for scoring, earlier than July (or so the SBE was told) for most districts. Paying for faster scoring was not involved in that decision, nor was it really a factor for the choice of the previous calibration method.

        Re higher ed admission tests that offer a tentative score for the MC section at the close of a testing session, that is an option for SB tests. But, my understanding is most higher ed applications involve primarily MC test questions, with very limited or no human-scored questions, and hence the tentative score is the almost always the same as the official score that comes later. Thus, labeling the immediate score as a “tentative score” is really just CYA in case something unusual is detected from the record of the testing session. Also, graduate level/profession school computer-adaptive tests involve less than 100,000 test administrations while the SB tests will involve upwards of 8 million test administrations. Offering both tentative and final or official scores, especially if there is a good chance the two scores will be different for individual students, could generate significant issues with inadvertant (or perhaps intentional) misuse of the tentative scores. Tentative scores at the end of SB computer-adaptive testing sessions are a possibility that can be explored, but I’m not sure it is a good choice given the potential unintended consequences of having two sets of scores out there.

        Doug

  8. Paul Muench 10 years ago10 years ago

    My understanding is that ES students already on the prior math sequence will stay on that sequence until graduation.

  9. navigio 10 years ago10 years ago

    i took some of the online tests and a one thing i noticed with the math section was when having to describe area you needed to draw a rectangle on a grid with your mouse. the interface was so bad that it took me like 10 tries just to get the right location (i work with computers 24 hours a day so its not like im technologically incompetent). this made me real nervous when i … Read More

    i took some of the online tests and a one thing i noticed with the math section was when having to describe area you needed to draw a rectangle on a grid with your mouse. the interface was so bad that it took me like 10 tries just to get the right location (i work with computers 24 hours a day so its not like im technologically incompetent). this made me real nervous when i heard the computers our district bought explicitly for these tests didnt even have mice (rather just track pads). hopefully they can figure out how not to have a substandard user interface have an impact on results.

    Replies

    • Manuel 10 years ago10 years ago

      navigio, you work 24 hours a day? do you sleep? Yes, I have heard through the grapevine the difficulties they are having with iPads, where "the finger" is the input device. Not easy to go back and forth between a keyboard and the screen to interact with the app. The other problem is that experienced by anyone trying to move an object on a grid. If the "drag-and-drop" is too rigid and doesn't have a "proximity" … Read More

      navigio, you work 24 hours a day? do you sleep?

      Yes, I have heard through the grapevine the difficulties they are having with iPads, where “the finger” is the input device. Not easy to go back and forth between a keyboard and the screen to interact with the app.

      The other problem is that experienced by anyone trying to move an object on a grid. If the “drag-and-drop” is too rigid and doesn’t have a “proximity” setting, you’ll have a hell of a time positioning the object being dragged. Oh, well, human-factors engineering (or whatever this field is called) will have to be implemented in these tests before they can be truly ready for prime-time.