Introduction: AI Doesn’t Always Mean LLM
In the current AI landscape “LLM” (Large Language Model) has become a buzzword. Since the launch of models like GPT-3 by OpenAI in 2020, businesses have been eager to adopt this technology. The rise of LLMs has been further amplified by the launch of ChatGPT in 2022, drawing widespread attention from both academia and popular media. These models, powered by large-scale datasets and immense computational resources, have revolutionized natural language processing (NLP) and demonstrated remarkable capabilities across various domains.
LLMs have not only transformed the field of NLP but have also spurred innovation in numerous industries. Companies in countless domains are racing to integrate LLMs into their operations or to develop entirely new products around them. The media constantly fuels this enthusiasm, highlighting the latest advancements and features powered by LLMs. This widespread excitement has created the impression that leveraging LLM technology is essential to staying competitive.
But here’s the key question: Are LLMs truly the best choice for every use case? In some scenarios, traditional machine learning (ML) methods may offer more effective, cost-efficient, and practical solutions. This article explores when to opt for LLMs and when other AI approaches might better serve your needs. While LLMs are undeniably powerful, they are not a one-size-fits-all solution. Understanding their strengths and limitations will help you make informed decisions about which AI technology best fits your specific requirements.
What is an LLM and What Does it Do?
At its core, the function of a large language model (LLM) appears straightforward: given an input text known as a “prompt”, the model predicts and generates the most likely following word in the sequence, one token at a time. This process, known as autoregressive language modelling, is illustrated in the following image:
In essence, LLMs perform the same fundamental task as earlier language models. Even simple n-gram models, first introduced by Claude Shannon in the 1940s, operated on the same principle of predicting the next word based on prior context. However, while the basic premise remains unchanged, the challenge is to choose the next words so that a semantically meaningful and useful text arises. Hence, the method of determining the next words makes the difference.
What sets LLMs apart from their predecessors can be attributed to three key factors:
- Scale: LLMs are characterized by their vast number of parameters, often exceeding one billion. This massive capacity enables them to model complex linguistic patterns and nuances.
- Data: To fully leverage their potential, LLMs are trained on enormous datasets, encompassing diverse domains and styles.
- Architecture: Modern LLMs rely on the transformer architecture (or its variations) that effectively handles long-range dependencies in text.
The development and training of such large-scale models has only become feasible in recent years, thanks to advancements in distributed GPU computation.
The impact of LLMs is profound. Beyond generating simple, structured sentences like the one in the above example, these models can comprehend and respond to intricate contexts with remarkable accuracy, making them versatile tools for a wide range of applications.
For more details on LLMs, check out this video by IBM:
The Pros and Cons of LLMs vs. Traditional NLP Models
The ability to process and act upon complex textual queries makes LLMs a multi-task tool for all kinds of text processing problems. They can summarize, translate or even engage in casual conversation. With some creativity, they can perform countless other tasks, thanks to their so-called “emergent” abilities. This versatility makes them highly appealing for various applications. In contrast, traditional natural language processing (NLP) models are typically designed and optimized for a specific task (e.g. sentiment analysis, machine translation or summarization) using task-specific labeled data.
Both LLMs and traditional NLP models come with their own sets of advantages and disadvantages:
Advantages and Disadvantages of LLMs
Advantages | Disadvantages |
Exceptional ability of context-understanding | Often accessible only through costly APIs (e.g., ChatGPT / GPT4o) |
Out-of-the-box (or few-shot) multi-task ability | Self-hosting requires expensive hardware |
Highly flexible | High latencies, especially without high-performance infrastructure |
Good for tasks with open-ended outputs (e.g. creative writing) | Bad for closed output format, can lead to unwanted or lenghty outputs |
Requires minimal training data for fine-tuning | Limited control over output format |
Good for unstructured text input data | May lack domain or use-case-specific precision |
Prone to hallucinations (producing false or nonsensical information) | |
Difficult to evaluate results accurately |
Advantages and Disadvantages of Traditional NLP Models
Advantages | Disadvantages |
Greater computational efficiency | No task generalization or emergent capabilities |
Faster inference times | A new model must be trained for each task |
Easier to self-host and deploy | Often requires significant amounts of task-specific data to perform wel |
Ideal for tasks with fixed output formats (e.g. classification) | Poor performance in language generation tasks |
Good for structured and repetitive tasks | |
Can be trained in-house without excessive resources | |
Easier to customize for specific use cases or domains |
Understanding the advantages and disadvantages of LLMs and traditional NLP models forms the basis for choosing the right technology for your project. But how can this knowledge be applied in practice?
Decision Guide: Should You Use an LLM?
Choosing whether to use an LLM for your specific use case involves considering several factors. The decision tree below provides a structured approach to guide this decision, building on the advantages and disadvantages outlined in the previous section.
While this guide offers a helpful framework, it does not cover every possible use case or product idea. Likewise, its questions might not always apply in this exact order. The complexity of real-world-scenarios often requires a more nuanced analysis. Nevertheless, the decision tree serves as a useful starting point. If the path through the decision tree does not apply well to a use case, you can adapt the questions and reorder them as needed to better suit your specific context.
The following section illustrates how to apply the decision tree using three example use cases.
Using the Decision Guide: Three Example Use Cases
Example 1: Code Suggestion Tool
Do I want to generate complex (creative) text? → Yes |
Decision: Large Language Model (LLM) |
Explanation: Code suggestion tools typically take a text string of source code as input and generate helpful suggestions for the next steps. These suggestions vary widely, ranging from a single comment or a logical next line of code to entire functions or code blocks. Since this task requires generating complex text (i.e. source code), an LLM is the appropriate choice.
Example 2: On-Device Topic Classification in a Mobile App
Do I want to generate complex (creative) text? → Yes |
Am I ready to pay for recurring API calls or expensive hardware? → No |
Can I obtain labelled training data? → Yes |
Decision: Other NLP Method (e.g. Encoder-based transformer for classification) |
Explanation: Topic classification is not a generative task but a text classification one. While LLMs are capable of handling topic classification, using them on-device is often impractical or infeasible due to their high resource requirements. In this example, we assume that labelled training data is available, so we decide to train a non-generative encoder-based transformer – a more efficient and suitable choice.
Example 3: ASR Transcript Correction
Do I want to generate complex (creative) text? → Yes |
Am I ready to pay for recurring API calls or expensive hardware? → Yes |
Do I want to solve many different text-related tasks? → No |
Do I require low-latency inference? → No |
Do I need high control over the output format? → Yes |
Decision: Other NLP Model |
Explanation: Automatic Speech Recognition (ASR) systems sometimes produce transcripts with errors. Correcting these involves fixing the erroneous words while preserving the rest of the text as-is. Hence, it is not necessarily a generative task but can be treated as a classification problem. In this scenario, we assume that we are ready to pay for expensive hardware and do not require low latency. However, LLMs do not provide a fixed output format, and it is hard to make them stop generating at a specific point. This could lead to not just fixing the ASR transcript but also to generating more unwanted text. For this reason, a classification model better suits this use case, ensuring precise and controlled corrections without extra, undesired output.
Beyond Text: Can an LLM Help With my Data?
While LLMs excel at handling text, many other types of data – often called modalities – exist, including images, audio, and tabular data. Sometimes, companies seek LLM solutions even when their primary data is not text-based. Although there is ongoing research into multimodal models that combine LLMs with other types of data processing, most of these approaches remain experimental.
For now, it is generally better to rely on tested and proven AI models specifically designed for tasks like image or audio processing. The decision tree below provides guidance on choosing the appropriate AI model for different data types.
Figure 3: Decision tree for selecting suitable AI models by data type © Jannis Spiekermann
The graphic clearly shows that LLMs just represent a small part of the myriads of AI models, which have been developed in recent years. Depending on the type of data, different specialized solutions, such as convolutional neural networks (CNNs) for images or recurrent neural networks (RNNs) for time-series data, are often more suitable.
It is worth noting that many of these models are increasingly being replaced or enhanced by the powerful transformer architecture, which underpins most LLMs. However, these models function differently from LLMs, as they are optimized to address specific problems beyond text generation and analysis.
Conclusion: The right solution for every modality
The ability to process and act upon complex textual queries makes LLMs a multi-task tool for all kinds of text processing problems. However, as discussed in this blog post, they are not suitable for every use case. As powerful as they are, they come with certain limitations. When considering the integration of AI into a product or workflow, it’s essential to evaluate alternative (text-processing) solutions as well – especially when dealing with non-textual data / other data modalities.
To help you make an informed decision, this post offers a decision guide for textual data. For multimodal data, refer to the previous section and the attached AI Needs Assessment Guide at the end of this post.