All students need to learn data science

Credit: Allison Shelley for American Education

We live in a world driven by data. Data is collected and stored on every human interaction, whether commercial, civic or social. Enormous server “farms” across the world save, preserve and serve data on demand. A list of the most in-demand jobs includes data-scientist and statistician. Algorithms determine prison sentences, scan video feeds to identify potential suspects of crimes, and assist in decisions regarding loans, college admissions and employment interviews. 

But problems lurk. Algorithms trained using data that poorly represent the populations to which they are applied leave members of some groups at greater risk of being mistakenly incarcerated. Data models developed without input from contextual experts exacerbate existing patterns of racism and sexism. Data is stolen, allowing thieves to impersonate others and steal millions. Privacy is threatened, and your local grocery chain may know more about your medical conditions than your closest family members. 

Would it surprise you, then, to learn that high school students are not required to study statistics or data science? Fortunately, even though such courses are not required, for more than a decade a growing number of California high school students have had the opportunity to take statistics courses — and since 2013, data science courses — to meet the admissions requirements of the University of California and the California State University systems. Currently, this pathway to college access is being reviewed by the University of California academic senate. Closing it will make it even more difficult for students to learn relevant and necessary skills for 21st century life.

I, along with other statisticians, view data science as a much-needed upgrade of the current statistics curriculum. It was in this spirit of modernization that I joined a team consisting of high school teachers, UCLA statisticians, computer scientists and education researchers, to develop the Introduction to Data Science, or IDS, course.  This course, supported by the National Science Foundation and the first (I believe) yearlong high school data course in the U.S., was designed to better reflect the modern practice of statistics — which relies on computers, algorithms and both predictive and inferential modeling — than existing high school statistics courses do.

The course was approved in 2013 as a statistics course by UC’s High School Articulation Unit. This came as no surprise because it reflected the fact that Introduction to Data Sciences was designed as a statistics course following guidelines established by the American Statistical Association, the National Council of Teachers of Mathematics, and the Common Core state standards (not the result of a flawed approval process, as some have alleged). Statistics courses have long been approved as high school math courses without being required to teach Algebra II standards.

For some reason, this long-standing practice has recently been viewed as controversial, leading to the current UC review and allegations that data science courses offer insufficient algebraic rigor. The real issue is about the purpose of high school mathematics education. Is it designed only to serve students who will major in science, technology, engineering and math, which requires advanced algebra at some point, or should it serve the needs of all students? And if it is meant to serve only future STEM students, is Algebra II the only starting point? The real issue isn’t about offering “weak” math or strong math, but about providing rigorous courses that prepare students for life in the modern data-driven world. Modern statistics courses provide foundational skills and knowledge that are needed by most (if not all) high school students.

Don’t just ask me. After all, I am one of the developers. Ask high school leaders. There has been widespread demand for these courses. Since our initial pilot in 10 schools in 2014–15, Introduction to Data Science is offered in 189 high schools around the nation, and more than 400 high schools around the state are offering one of the available data science courses.

Ask the researchers who found that courses such as ours improved college preparation and matriculation.

Ask leaders at UC Berkeley, among the first universities to recognize the importance of data science. In establishing their wildly popular introductory data science course, Data 8, they emphasized that the instructional approach “should not be viewed as ‘going soft on the math’” and that “conceptual understanding can be developed, perhaps even better developed, through direct experience and computational actions performed with one’s own hands, rather than through symbolic manipulation.” 

While it is true that high school students shouldn’t be forced to make “major” life decisions such as whether to take Algebra II and embark on the STEM path, for many students, this decision is made for them. One study of over 450,000 California high school students found that of those who passed Algebra I, only 40% continued to Algebra II. Courses such as Introduction to Data Science create more opportunities for students to develop mathematical skills and prepare to attend a four-year college — and even to take Algebra II if they choose. 

Statistics and data science courses prepare students to address many of the major issues of our time. STEM students are not excused from the need to study data science. Many recent scandals and controversies in scientific work have centered around the misuse and misunderstanding of fundamental statistical concepts. These challenges point to the need for students of STEM to deepen their study of data science.

All students need data science; some students also need Algebra II. Not the other way around.

•••

Robert Gould is a teaching professor at the UCLA Department of Statistics and Data Science, a fellow of the American Statistical Association, founder of the ASA DataFest competition, and co-author of a college introductory statistics textbook: Exploring the World through Data.

The opinions expressed in this commentary represent those of the author. EdSource welcomes commentaries representing diverse points of view. If you would like to submit a commentary, please review our guidelines and contact us.

EdSource in your inbox!

Stay ahead of the latest developments on education in California and nationally from early childhood to college and beyond. Sign up for EdSource’s no-cost daily email.

Subscribe