For much of the artificial intelligence boom, the industry’s attention was fixed on training: building larger models, feeding them more data and pushing the limits of raw capability. Now, that center of gravity is shifting. What matters increasingly is not just how AI is trained, but how often, how quickly and how intelligently it can be used in the real world.
That is the idea behind what NVIDIA’s chief executive, Jensen Huang, described at GTC as the arrival of the “inference inflection.”
The phrase captures a turning point in the AI market. Systems are no longer valued only for their ability to generate text, images or code in a controlled setting. They are being asked to do more demanding work: to reason through problems, use tools, read files, understand context and carry out productive tasks with a degree of autonomy. In practical terms, that means AI is moving from demonstration to deployment.

AI will be used in the key industries
And deployment requires inference.
Inference is the phase in which a trained AI model is actually used. It is the moment a system responds to a prompt, analyzes a document, makes a decision, writes code, summarizes a meeting or completes a task. If training is the creation of intelligence, inference is its application. As AI systems become more agentic — breaking problems into steps, calling tools, revising answers and operating across longer chains of reasoning — inference becomes vastly more important, and vastly more expensive.
Huang put it plainly in his keynote: “There’s a reason for that. This fundamental inflection. Finally, AI is able to do productive work and therefore the inflection point of inference has arrived. AI now has to think. In order to think, it has to inference. AI now has to do. In order to do, it has to inference. AI has to read. In order to do so, it has to inference. It has to reason. It has to inference. Every part of AI, every time it has to think, it has to reason. It has to do. It has to generate tokens. It has to inference. It’s way past training now. It’s in the field of inference. So the inference inflectionhas arrived at the time when the amount of tokens, the amount of compute necessary, increased by roughly 10,000 times.”
That remark goes to the heart of a major change underway in artificial intelligence economics. In the last two years, according to the keynote, compute demand for AI work has increased by roughly 10,000 times, while usage has climbed about 100 times. Huang suggested that among startups and major AI labs such as OpenAI and Anthropic, the real increase in computing demand may feel closer to one million times over the same period.
That gap matters. It suggests that the next phase of AI will not be defined only by who has the smartest model, but by who can afford to run it at scale.
Inference is becoming the new bottleneck. When AI systems are expected to reason before answering, process more tokens, consult external tools and operate continuously inside products and workflows, the underlying infrastructure has to do much more work per user interaction. A simple chatbot response is one thing. An AI agent that reads documents, plans actions, iterates through options and produces a useful outcome is something else entirely. The second model consumes far more compute, and therefore far more capital.
That helps explain why NVIDIA has placed such heavy emphasis on this phase of the market. The company designated 2025 as its “year of inference,” with a strategy centered on making sure its infrastructure performs across the full lifecycle of AI — from training to post-training and inference — while extending hardware usefulness and lowering costs for investors. In other words, NVIDIA is not simply selling chips for model creation. It is positioning itself as the central supplier for the operational age of AI.
The market projections cited in the keynote underscored the scale of that bet. Last year, Huang said, there was already strong confidence behind demand and purchase orders totaling $500 billion for Blackwell and Ruben systems through 2026. Looking ahead through 2027, he said, he now sees at least $1 trillion in demand, while also suggesting that real computing demand could end up even higher.
Those figures are striking not only for their size, but for what they imply about investor expectations. The AI market is no longer being priced solely around model development. It is increasingly being priced around sustained consumption— the daily, repeated computational load required when AI becomes embedded in search, software, enterprise automation, robotics, science and digital assistants.
This is why the inference inflection point matters. It changes the story of AI from one about invention to one about industrialization.
For startups, it raises the cost of ambition. Building an impressive model is no longer enough; companies must also finance the infrastructure needed to serve real users at high frequency. For cloud providers and chipmakers, it creates an enormous commercial opportunity, because every leap in agentic capability drives more demand for inference hardware. For enterprises, it signals that adopting AI at scale may be more expensive and more operationally complex than many early forecasts assumed. And for the broader market, it suggests that demand for compute could remain intense even if the pace of headline model releases slows.
In that sense, the inference inflection point is not merely a technical milestone. It is a market reset. It marks the moment when AI begins to behave less like a research breakthrough and more like a utility — one that must be delivered constantly, reliably and at great computational cost.
The industry spent the past several years proving that AI could learn. It is now entering a period in which it must prove that AI can work. And if Huang is right, that shift will do more than reshape NVIDIA’s business. It may redefine the economics of the entire AI era.
Image: NVIDIA Keynote Screenshot






