The Question Concerning Technology (in the Statistics classroom)

February 01, 2015 essays

In the past thirty years, cheap, ubiquitous computing power has allowed the field of statistics to address a wide variety of questions that previously would have been impractical. Any situation in which a closed-form expression for a particular quantity does not exist would have been virtually impossible to calculate by hand, and problems involving a large number of coefficients would have been unthinkable to solve. But computing has also made it easier than ever for anyone with little statistical understanding to use a statistical package, treat a procedure like a black box, and obtain a p-value without understanding the assumptions inherent in that procedure. How should the field of statistics teach technology to address both the increasing importance of computers and the dangers inherent in using them blindly?

I propose a system in which upper-level statistical methods classes are taught with four types of questions that cover the same material: Interpretation, programming, theory, and hand calculations, with the largest focus on interpretation and the smallest focus on hand calculations, which should only be used to provide a cursory understanding of how a procedure works. Now that computers are the “tools of the trade”, it makes little sense to have students spend a ton of time hand-calculating quantities by punching numbers into their calculators when a computer can do the work and get the right answer in a fraction of a second. The time spent on hand calculations should be spent programming and doing analysis in a language like R, where choice of language is a mandatory component of the class. Running a regression by programming it would be tedious at first but would be a very rewarding process for students in the long run. The analysis itself should be as open-ended as possible. Instead of asking “Build a model with X1, X2, X3, and X4”, problem sets should contain a research question and ask the student to pick an appropriate model using techniques taught in class.

Of course, there’s always the worry that an increasing focus on technology would distract from the math in theoretical courses. Fortunately, technology often enhances students’ understandings in theoretical courses because they can directly see theoretical results! Proving the Law of Large Numbers may be a vital part of an undergraduate statistics education, but running a simulation in R allows a statistics student to directly see how it is applicable. Discussing MCMC in the context of a stochastic processes class makes sense, but implementing a Gibbs sampler in R would be an incredibly valuable experience for students. Doing so would show not only that a student understands the underlying theory, but would enhance the student’s understanding of how a Gibbs sampler works. The same is true for any other algorithm that the statistical package uses. Manually coding a particular algorithm both improves a student’s programming skills and demonstrates that a student knows what the computer is doing internally when it spits out a quantity. It is a solution to both problems I outlined at the beginning.

I also propose that professors should stop giving out textbook data sets. They are often intended to teach a single concept through conditioned data intended to teach a particular result. Replacing them with real-world messy data sets would better prepare students for the complexities of working with real data. Real data is not nicely formatted. Issues like spacing, field organization, incorrectly entered values, and missing data are always parts of real data sets. Allowing students to work with and around these issues provides experience with a messy, fundamental part of data analysis in 2015. The most recent issue of Amstat News features interviews with faculty at the fastest-growing undergraduate statistics departments in the world. Carnegie Mellon’s Rebecca Nugent and Howard Seltman mentioned that they stopped using textbook data sets past introductory classes and it has been an astounding success.

Computers are no longer an optional part of a statistics education. Statistics departments should recognize their importance in the contemporary statistics world and teach programming as appropriate.