From smarter models to scalable systems
For a long time, the AI industry has focused on training larger and more powerful underlying models. But as we move towards 2026, the challenge has changed to efficiently delivering these systems to millions of users in real time. Reports such as Deloitte’s TMT Predictions 2026 show that inference workloads already account for approximately two-thirds of total AI computing spending. This highlights how running AI systems at scale has become a real major challenge.
Growing computing demand
The industry also faces critical infrastructure constraints. The GPU supply chain is tight, with wait times reaching nearly a year. At the same time, high-bandwidth memory remains limited and global data center expansion struggles to keep pace with demand. Only a small portion of the large capacity planned for 2026 is currently under construction, focusing on the widening gap between AI demand and available resources.
Benefits of AI flywheel
Suleiman’s approach revolves around what he describes as a powerful “flywheel” effect. High-margin AI products, such as enterprise tools and subscription-based software, can command high inference prices.
For companies like Microsoft, this creates a cycle like this:
- invest heavily in infrastructure
- Deliver faster, more reliable AI services
- attract more users
- generate higher profits
- Reinvest in better systems
Challenges for small players
Not all organizations have the financial capacity to handle increased expenses. Startups and consumer AI platforms often operate on tight budgets, making it difficult to source premium computing resources. This constraint can impact performance quality, response time, and user engagement. Without sufficient capital, small businesses may find it difficult to compete with large companies that dominate infrastructure investment.
Billions of dollars are being invested in AI infrastructure
To stay ahead, Microsoft is investing heavily in AI infrastructure, reportedly spending more than $80 billion annually. This level of spending highlights how important computing power has become in shaping the future of artificial intelligence.
Decisive changes in the AI industry
Suleiman’s perspective reflects major changes in the industry. The next stage of the AI race may depend less on creating the most intelligent systems and more on delivering them efficiently at scale. As computing costs continue to rise and resources remain limited, financial strength and access to infrastructure will likely become determining factors that reshape the future of AI.
FAQ:
Q1. What did Mustafa Suleiman say about the future of AI?
He said computing costs, more than model intelligence, will shape the industry. This means infrastructure and affordability will determine success.
Q2. What is inference calculation in AI?
Refers to the resources required to run an AI model in real time. This is different from training the model, which is done upfront.
