Breaking News
LLM Costs Set to Plunge by 90% by 2030, But Advanced AI May Still Remain Expensive: Gartner
2026-03-26
Gartner has projected a dramatic decline in the cost of running large language models (LLMs), estimating that by 2030, inference expenses for trillion-parameter models could fall by more than 90% compared to 2025 levels.
The research firm also noted that LLMs could become up to 100 times more cost-efficient than comparable models built in 2022, driven by rapid advancements across hardware, software, and infrastructure.
At the core of this cost shift are improvements in semiconductor technology, more efficient data center infrastructure, better model architectures, and higher chip utilization. The increasing use of inference-optimized silicon and the growing role of edge computing in handling specific workloads are also expected to play a significant role.
AI systems process information in units known as tokens—small chunks of data roughly equivalent to a few characters of text. While the cost per token is expected to decline sharply, the overall economics of AI may not become as accessible as it appears.
According to Will Sommer, the falling cost of tokens will not necessarily translate into lower expenses for enterprise users. One reason is the rise of more advanced AI systems, particularly “agentic” models that can independently perform complex, multi-step tasks. These systems consume significantly more tokens—often between five and thirty times more per task than standard chatbot-style applications.
As a result, even though the unit cost of processing data is decreasing, total usage is expected to grow at a faster pace, potentially increasing overall inference spending rather than reducing it.
Gartner’s analysis outlines two possible scenarios shaping future costs: one based on cutting-edge semiconductor technologies and another reflecting a broader mix of available hardware. The latter is expected to remain more expensive due to lower computational efficiency.
The firm cautions that cheaper tokens alone will not democratize access to advanced AI capabilities. High-end reasoning systems will continue to demand substantial computational resources, making them relatively scarce and expensive despite broader efficiency gains.
Instead, value is likely to shift toward platforms that can intelligently distribute workloads across different types of models. Routine, high-volume tasks can be handled by smaller, specialized models that offer better cost efficiency, while more complex and high-value operations will rely on larger, frontier models used selectively.
The report advises product leaders to focus on optimizing system architecture rather than relying solely on declining token costs, warning that inefficient designs masked by temporarily lower costs today could become significant constraints as AI systems scale in complexity.
The research firm also noted that LLMs could become up to 100 times more cost-efficient than comparable models built in 2022, driven by rapid advancements across hardware, software, and infrastructure.
At the core of this cost shift are improvements in semiconductor technology, more efficient data center infrastructure, better model architectures, and higher chip utilization. The increasing use of inference-optimized silicon and the growing role of edge computing in handling specific workloads are also expected to play a significant role.
AI systems process information in units known as tokens—small chunks of data roughly equivalent to a few characters of text. While the cost per token is expected to decline sharply, the overall economics of AI may not become as accessible as it appears.
According to Will Sommer, the falling cost of tokens will not necessarily translate into lower expenses for enterprise users. One reason is the rise of more advanced AI systems, particularly “agentic” models that can independently perform complex, multi-step tasks. These systems consume significantly more tokens—often between five and thirty times more per task than standard chatbot-style applications.
As a result, even though the unit cost of processing data is decreasing, total usage is expected to grow at a faster pace, potentially increasing overall inference spending rather than reducing it.
Gartner’s analysis outlines two possible scenarios shaping future costs: one based on cutting-edge semiconductor technologies and another reflecting a broader mix of available hardware. The latter is expected to remain more expensive due to lower computational efficiency.
The firm cautions that cheaper tokens alone will not democratize access to advanced AI capabilities. High-end reasoning systems will continue to demand substantial computational resources, making them relatively scarce and expensive despite broader efficiency gains.
Instead, value is likely to shift toward platforms that can intelligently distribute workloads across different types of models. Routine, high-volume tasks can be handled by smaller, specialized models that offer better cost efficiency, while more complex and high-value operations will rely on larger, frontier models used selectively.
The report advises product leaders to focus on optimizing system architecture rather than relying solely on declining token costs, warning that inefficient designs masked by temporarily lower costs today could become significant constraints as AI systems scale in complexity.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.




