The Best of Both Worlds: Machine Learning and Statistics – Interpretable Models with Big Data

The Best of Both Worlds: Machine Learning and Statistics – Interpretable Models with Big Data

On 16 September 2025, Princeton University’s Department of Psychology hosted a Seminar in Advanced Research Methods lecture featuring Dr Josh Starmer, founder and CEO of the educational platform StatQuest. In “The Best of Both Worlds: Machine Learning + Statistics = Interpretable Models with Big Data,” Dr Starmer explored how traditional statistical linear models and modern machine learning techniques can be combined—through regularization—to yield interpretable yet scalable approaches for extracting insights from large datasets.

Opening Remarks and Speaker Introduction

The session opened in A03 of the Princeton Neuroscience Institute with a brief welcome from the department’s seminar coordinator. They introduced Dr Josh Starmer—who holds a PhD in computational biology, previously served on the faculty at the University of North Carolina at Chapel Hill, and collaborated with deep learning.ai alongside Dr Andrew Ng. Highlighting his reputation as the “patron saint of Silicon Valley” and author of the StatQuest Illustrated Guides to neural networks and machine learning, the coordinator noted Dr Starmer’s talent for unpacking complex concepts without oversimplification.

Bridging Machine Learning and Statistics

Dr Starmer began by contrasting the typical machine learning approach—prioritizing predictive accuracy on vast datasets with minimal concern for underlying mechanisms—with the statistical emphasis on interpretability and theoretical guarantees. He illustrated how decision trees classify observations based on training and testing splits, then shifted to linear regression’s ability to both predict outcomes (e.g., revenue from store counts) and quantify confidence via residual analysis.

Interpreting Linear Models: R-Squared and p-Values

Using simple two- and three-point examples, Dr Starmer demonstrated how the sum of squared residuals around a fitted line compares to that around a mean-only prediction. He showed that R-squared describes the percentage reduction in error when incorporating a predictor, and that its p-value—estimated by simulating random data—assesses whether an observed R-squared could arise by chance. With only two points, R-squared always equals one, yielding a p-value of 1.0; with three points, his histogram-based approach yielded an R-squared of 0.44 and a p-value of 0.53, underscoring the need for sufficient data to draw reliable inferences.

From Simple to Multiple Regression and Beyond

Extending beyond one predictor, Dr Starmer introduced multiple regression—fitting a plane when predicting revenue from both store counts and product variety—and emphasized that the same R-squared and p-value framework applies. He then showed how categorical predictors lead to t-tests (comparing two group means) and ANOVA (multiple groups), and how combining continuous and discrete variables yields ANCOVA. Throughout, the unifying theme was that all these “different” techniques are instances of linear models distinguished only by how predictors are encoded.

The Power of Regularization

To reconcile statisticians’ desire for interpretability with the high-dimensional settings favored in machine learning, Dr Starmer presented regularization methods—specifically the Lasso (L¹) penalty. He described how adding λ·|slope| to the sum-of-squares objective automatically shrinks or zeros out coefficients for uninformative variables. By plotting how the optimal slope shifts toward zero as λ increases (from no penalty up to a point where the horizontal line minimizes the penalized loss), he illustrated how regularization filters out noise and retains only useful predictors.

Key Takeaways

Throughout the talk, Dr Starmer emphasized that linear models—when augmented with regularization—offer the “best of both worlds”: they remain computationally efficient, theoretically grounded (via R-squared and p-values), and capable of handling large numbers of variables without manual stepwise selection. For researchers aiming both to understand underlying processes and leverage big data, this approach provides a clear, actionable framework.


The Princeton University Department of Psychology is a leading hub for exploring the biological, cognitive, and social foundations of behavior. By integrating cutting-edge research in neuroscience, computational methods, and social science, the department fosters interdisciplinary collaboration that advances our understanding of mind and mental health. Through rigorous training and global partnerships, it prepares scholars to translate psychological insights into innovative solutions for education, healthcare, and policy, empowering graduates to address pressing challenges and promote well-being worldwide.

The Conf is a platform that reports on scholarly conferences, symposia, roundtables, book talks, and other academic events. It is managed by a group of students from leading American and European universities and is published by Alma Mater Europaea University, Location Vienna.

Share this article: