The Limitations of LLMs: Why Simple Models Still Outperform in Most Use Cases

LLMs and deep learning are highly effective in some contexts, but most industry use cases are still better addressed by traditional ML models. We discuss where the LLM hype is warranted, where it’s overblown, and how we can achieve some of the same advantages of DL with simpler models.

It’s common industry knowledge that LLMs aren’t suited to most tasks, but that hasn’t stemmed the wave of hype that’s swept the media, markets, and board rooms over the past year.

In part, that fever pitch of attention has come from machines doing what we consider distinctly human things like chatting, writing, and making art. Ask ChatGPT to write a thank you note to your partner for cooking dinner and you’ll get eloquent, heartfelt exposition on their value as a person.

In contrast, when machine learning does what we expect it to be good at, like predicting which ads you’ll click on, forecasting weather patterns, or detecting fraud, it seems unremarkable — things that people can’t do well at scale because they require thinking statistically (which humans are notoriously bad at).

Of course, it actually is quite remarkable that machine learning can save lives by predicting extreme weather events, or cancer symptoms, or proteins that explain the inner workings of our biology.

Even so, the reality that machine learning already pervades our lives in less visible ways has largely flown under the radar. And those use cases largely rely on simple models, not deep learning or large language models.

Where LLMs excel — and where they don’t.

LLMs work well with homogenous text and image datasets, excel at natural language processing, and (depending on who you ask) pass the Turing test. That’s made them useful at tasks like customer support, content creation and summarization, and programming assistance.

They’re less useful with heterogeneous tabular and time series data — think categorical variables like lists of colors, and continuous values, like stock prices over time. The core architecture of how an LLM processes data doesn’t allow for reliability or precision when it comes to this kind of data.

Unfortunately, that’s where a lot of hard, impactful problems live — problems like financial modeling, proteomics and metagenomics, healthcare diagnostics, supply chain optimization, energy management, robotics, and cybersecurity.

In real world applications, tabular data is still the most common data type and “traditional” or “classical” approaches like tree-based models still show superior performance in those contexts. Models like XGBoost see widespread industry use due to both their practicality and efficacy in these use cases. They are simply better at learning and reflecting the things that matter most in these settings.

It was heartening to see Ravid Shwartz Ziv’s paper on the superiority of tree ensemble models on tabular data go viral with more than 1,000 citations to date. Not because it offered anything novel, but precisely because it didn’t — it simply said out loud what everyone in industry was already thinking (and goes on to rigorously prove that these tried and true methods still shine).

This doesn’t come as a surprise — no single approach is going to consistently beat all others across all domains. The landscape of problems in Data Science and Machine Learning is simply too vast for ‘one model to rule them all’. It’s obviously better to recognize the advantages and disadvantages of each model and how its inherent behavioral assumptions align with particular problems.

Rather than applying the newest, shiniest model and trying to make the data conform to it, it’s generally better to choose the model based on the types and structures of data, and that data’s inherent behavior. The structure of data is just as important as how that data behaves statistically and different models may be more successful at conforming to that structure than others.

Data sparsity, mixed feature types, lack of prior knowledge about dataset structure, and lack of locality make deep learning a non-starter for many applications. But if the theoretical advantages of deep learning haven’t paid off wholesale, that’s not to say that they can’t be achieved in other ways.

Getting the advantages of deep learning on tabular data — without the overhead.

Deep learning excels at capturing complex, non-linear, non-obvious, granular relationships in data, but requires vast amounts of data to do so.

It turns out that a particular type of deep learning called “representation learning” has the potential to do that well too — without the downsides.

This isn’t an “end” model that’s actually used for prediction. Instead, it’s an algorithm that learns how to better represent data to the end model for its respective prediction task.

Representation learning works with tabular time series data, across domains, and with any end model. Importantly, it’s lightweight, fast, and works with sparse or limited data — making it practical for tasks that deep learning typically wouldn’t be viable for.

This algorithm uses a unique loss function that learns how to create new, nearly orthogonal features that account for missing relationships in the data. In lay terms, it finds features in data that we (and our models) didn’t know existed.

Instead of relying on the model to implicitly recover these relationships, it explicitly defines what an optimal embedding should look like, then learns how to create those embeddings from the data — effectively distilling signal from what would otherwise look like noise to the end model.

The immediate impact is better predictive accuracy across problem settings and models, but this also represents a more fundamental innovation that enables other downstream modeling capabilities by shifting complexity from modeling to data. We expect this will become an everyday, indispensable part of every Data Scientist’s toolkit.

If you’re interested in learning more, get in touch to set up a Dark Matter trial. We’re currently looking for a few lighthouse customers to be early adopters at a reduced price in exchange for benchmark data and collaboration on case studies.

Ready for better model performance?

Get in touch to learn more and book a demo. 

Join the Waitlist

Early Access Form