Unobserved Confounders: How Admitting What We Don’t Know Can Unlock Data’s Full Potential
Unobserved confounders have long been considered a limiting factor in the accuracy of statistical models, but machine learning’s ability to approximate underlying patterns in data opens up new possibilities. Acknowledging and repositioning these unknown, non-linear relationships as fundamental attributes of any dataset presents an opportunity to advance statistical theory and its applications in machine learning. […]
The Limitations of LLMs: Why Simple Models Still Outperform in Most Use Cases
LLMs and deep learning are highly effective in some contexts, but most industry use cases are still better addressed by traditional ML models. We discuss where the LLM hype is warranted, where it’s overblown, and how we can achieve some of the same advantages of DL with simpler models. It’s common industry knowledge that LLMs […]
Redefining Data Quality: A Paradigm Shift in the Machine Learning Pipeline
Data quality issues form some of the central challenges in machine learning, but what do we mean by “quality”? Here we redefine and reframe the term to clarify both the problem and its potential solutions. Data quality can mean everything from how accurately a dataset reflects real-world events to its consistency in formatting to whether […]
Case Study: stuffmart.com Customer Conversion
TL;DR Background This case study focuses on predicting customer conversion in the online retail space using real-world customer behavior data and enhancing these predictions with the Dark Matter algorithm. Customer conversion is crucial for business success, directly impacting revenue, growth, and competitiveness, with strategies like A/B testing fostering continuous improvement and innovation. Results We’ve improved […]
Ensemble Raises $3.3M Seed Round Led by Salesforce Ventures to Accelerate Machine Learning Experimentation
SAN FRANCISCO, CA – [SEPTEMBER 9th, 2024] – Ensemble, a company dedicated to lowering barriers to state-of-the-art machine learning (ML), today announced it has raised $3.3M in seed funding, led by Salesforce Ventures with participation from M13, Motivate, and Amplo. “This year, we launched a new embedding API that learns to approximate hidden relationships in […]
Case Study: Banking Customer Churn Prediction
TL;DR Background “Churn” refers to the rate at which customers halt their business with a company. This case study focuses on predicting churn in the banking industry using real-world data and enhancing predictions with the Dark Matter algorithm. Results We’ve improved the accuracy of churn predictions, reducing unnecessary business decisions for customers who aren’t at […]
Case Study: Kinase Cancer Inhibitor Dataset & Performance
Background When selecting a dataset for a case study in biotech, it was important to find something unique to the domain. Ideally, we utilize a task with many and sparse features, difficult to discern signal-to-noise ratio, and a small number of examples. These are key characteristics of this dataset of cancer inhibitor protein interactions. How […]