Unobserved Confounders: How Admitting What We Don’t Know Can Unlock Data’s Full Potential
Unobserved confounders have long been considered a limiting factor in the accuracy of statistical models, but machine learning’s ability to approximate underlying patterns in data opens up new possibilities. Acknowledging and repositioning these unknown, non-linear relationships as fundamental attributes of any dataset presents an opportunity to advance statistical theory and its applications in machine learning. […]
The Limitations of LLMs: Why Simple Models Still Outperform in Most Use Cases
LLMs and deep learning are highly effective in some contexts, but most industry use cases are still better addressed by traditional ML models. We discuss where the LLM hype is warranted, where it’s overblown, and how we can achieve some of the same advantages of DL with simpler models. It’s common industry knowledge that LLMs […]
Redefining Data Quality: A Paradigm Shift in the Machine Learning Pipeline
Data quality issues form some of the central challenges in machine learning, but what do we mean by “quality”? Here we redefine and reframe the term to clarify both the problem and its potential solutions. Data quality can mean everything from how accurately a dataset reflects real-world events to its consistency in formatting to whether […]