Unobserved Confounders: How Admitting What We Don’t Know Can Unlock Data’s Full Potential

Unobserved confounders have long been considered a limiting factor in the accuracy of statistical models, but machine learning’s ability to approximate underlying patterns in data opens up new possibilities. Acknowledging and repositioning these unknown, non-linear relationships as fundamental attributes of any dataset presents an opportunity to advance statistical theory and its applications in machine learning. […]

The Limitations of LLMs: Why Simple Models Still Outperform in Most Use Cases

LLMs and deep learning are highly effective in some contexts, but most industry use cases are still better addressed by traditional ML models. We discuss where the LLM hype is warranted, where it’s overblown, and how we can achieve some of the same advantages of DL with simpler models. It’s common industry knowledge that LLMs […]

Redefining Data Quality: A Paradigm Shift in the Machine Learning Pipeline

Data quality issues form some of the central challenges in machine learning, but what do we mean by “quality”? Here we redefine and reframe the term to clarify both the problem and its potential solutions. Data quality can mean everything from how accurately a dataset reflects real-world events to its consistency in formatting to whether […]

Case Study: stuffmart.com Customer Conversion

TL;DR Background This case study focuses on predicting customer conversion in the online retail space using real-world customer behavior data and enhancing these predictions with the Dark Matter algorithm. Customer conversion is crucial for business success, directly impacting revenue, growth, and competitiveness, with strategies like A/B testing fostering continuous improvement and innovation. Results We’ve improved […]

Case Study: Banking Customer Churn Prediction

TL;DR Background “Churn” refers to the rate at which customers halt their business with a company. This case study focuses on predicting churn in the banking industry using real-world data and enhancing predictions with the Dark Matter algorithm. Results We’ve improved the accuracy of churn predictions, reducing unnecessary business decisions for customers who aren’t at […]

Case Study: Kinase Cancer Inhibitor Dataset & Performance

Background When selecting a dataset for a case study in biotech, it was important to find something unique to the domain. Ideally, we utilize a task with many and sparse features, difficult to discern signal-to-noise ratio, and a small number of examples. These are key characteristics of this dataset of cancer inhibitor protein interactions. How […]

Join the Waitlist

Early Access Form