In recent years, the parameter count of language models has steadily increased, with the GPT series, for example, growing from 1.5 billion parameters in GPT-2 to an estimated 1 trillion parameters in GPT-4. While companies like OpenAI and DeepMind have focused on scaling these models, the improvements haven’t provided a clear path to Artificial General Intelligence (AGI), although there have been some sparks…
The increasing size of these models is putting cutting-edge AI research beyond the reach of many AI labs, creating problems as external researchers find it difficult to reproduce these models. This limitation restricts their ability to investigate potential safety concerns and ties companies to the dataset and model design choices of industry leaders like OpenAI.
Additionally, the pace of innovation in GPU chips, used to run AI models is lagging behind the growth of model sizes. This discrepancy could soon lead to a point beyond which scaling cannot plausibly continue.
As a result, smaller language models, such as StableLM, LLaMA, and Alpaca, are gaining popularity. These models can achieve comparable performance to their larger counterparts at a fraction of the size. For example, LLaMA 13B has benchmark results similar to GPT-3 175B despite being 13 times smaller.
Another technique being explored to boost efficiency is training multiple smaller sub-models that are specialised for specific tasks instead of relying on a single large model for all tasks.
Focusing on smaller models is cost-effective. Most users prioritise cost-effectiveness over the absolute best performance. The GPT-4 API can be prohibitively expensive, discouraging users from building applications or using it for personal reasons.
Smaller models are also more manageable: Smaller models can be installed locally on devices, reducing dependency on external services and protecting data privacy. Companies can also train customized models on top of smaller open-source ones, ensuring their data is secure.
While large models have dominated the AI landscape in recent years, the focus is shifting towards smaller, more efficient models. With cost-effectiveness, practicality, and data security as primary concerns, smaller models seem to be a solid bet.