Article on AI compute supply
The GPU shortage explained
Demand for AI compute is rising far faster than the world can build the hardware and power to serve it. Here is why GPUs are scarce, in plain language, with real data.
Key takeaways
- AI compute demand has been growing several times faster than the supply of advanced hardware and the power needed to run it.
- Training compute for frontier AI models has grown roughly 4 to 5 times per year since 2010, according to Epoch AI.
- The bottleneck is no longer only chips. Power, data center space, and cooling are now hard limits too.
- Scarcity tends to favor large, organized buyers who commit early, which shapes who gets access first.
Why GPUs became the scarcest resource in technology
A graphics processing unit, or GPU, is the engine of modern artificial intelligence. Training and running AI models is mostly parallel math, the same operation repeated across enormous amounts of data, and GPUs are built to do exactly that kind of work at scale. As AI moved from a research curiosity to a mainstream tool used by hundreds of millions of people, demand for that specific kind of hardware climbed faster than almost anyone planned for.
The trouble is that supply cannot follow at the same speed. Advanced GPUs depend on a small number of chip fabrication plants, specialized high-bandwidth memory, and advanced packaging steps that take years and billions of dollars to expand. You cannot add a factory the way you add a shift at a warehouse. When the whole industry wants the same hardware at the same moment, a shortage is the natural result.
It helps to think of the GPU shortage not as a single problem but as a gap between two curves. One curve is demand for AI compute, which bends upward sharply. The other is the supply of finished, deployable hardware, which rises in slow steps. The space between those curves is the shortage, and for now it keeps widening rather than closing.
What the data shows
4 to 5x
Annual growth in training compute for frontier AI models since 2010, according to Epoch AI.
Source: Epoch AI, May 2024
415 TWh
Electricity used by data centres worldwide in 2024, about 1.5 percent of global supply, according to the IEA.
Source: International Energy Agency (IEA), April 2025
176 TWh
U.S. data center electricity use in 2023, up from about 58 TWh in 2014, according to Lawrence Berkeley National Laboratory.
Source: Lawrence Berkeley National Laboratory, December 2024
What is driving demand for AI compute
Three forces push demand higher at the same time. First, models keep getting larger. Epoch AI finds that the compute used to train frontier AI models has grown roughly 4 to 5 times per year since 2010, a pace that compounds with startling speed. A model trained two years from now may need many times the hardware of one trained today.
Second, far more people use AI every day. That means more inference, the compute needed to actually run a model after it is built. Every search summary, coding assistant, and chat reply is an inference call, and the number of those calls keeps multiplying as AI gets woven into everyday tools. Third, each new hardware generation, such as the NVIDIA Blackwell platform, is designed for ever larger workloads, so the most capable buyers race to secure it the moment it appears.
The International Energy Agency reports that data centre electricity demand could more than double to around 945 TWh by 2030, with electricity use by accelerated AI servers growing about 30 percent per year. When the appetite for compute grows that fast, the hardware that provides it stays in short supply even as factories run flat out.
What the shortage looks like inside a data center
The scarcity is easy to picture when you see a full hall of GPU racks. Each rack holds many accelerators, miles of cabling, dedicated power feeds, and a cooling path for the heat. Building one of these halls is a logistics and engineering project, not a purchase, which is part of why finished, running capacity is so hard to come by.
It is not only chips. Power is the new bottleneck
Even when chips are available, they need somewhere to run. That means data center space, high-density power delivery, and serious cooling. In the United States, Lawrence Berkeley National Laboratory estimates that data center electricity use rose to 176 TWh in 2023, up from about 58 TWh in 2014, and could reach 325 to 580 TWh by 2028 depending on how fast the buildout continues.
Building that capacity takes years, not weeks. New facilities need land in the right place, a grid connection that can deliver large and steady power, water or other cooling capacity, and skilled teams to operate everything around the clock. Any one of those can become the limiting factor, and increasingly the slowest link is power. A site can hold approved hardware and still wait on the grid.
This is why access to professionally hosted GPU capacity has become almost as valuable as the chips themselves. Owning a chip that has nowhere to run is not the same as owning working compute.
Common misconceptions about the shortage
It is just hype
The pressure shows up in measured electricity data, not only headlines. The IEA tracks data centre electricity rising toward roughly 945 TWh by 2030, a physical signal of real demand.
More fabs will fix it fast
New fabrication capacity takes years to build and qualify. Memory and advanced packaging have their own limits, so adding one piece does not unlock the whole chain.
Efficiency will cancel demand
Software does get more efficient, but Epoch AI reports algorithmic gains of about 3 times per year while demand grows faster. Efficiency softens the curve rather than flattening it.
Scarcity favors the organized
When a resource is scarce, it tends to flow to buyers who can commit early, purchase at scale, and operate the hardware properly. That is part of why large technology companies have moved aggressively to lock up supply, sign multi-year agreements, and build their own facilities. Priority access is itself an advantage, separate from price.
For everyone else, the practical question becomes how to get a real position in AI compute without trying to win a global supply race alone. The gap that matters most is between owning hardware and actually operating it, because a chip only becomes useful compute once it is powered, cooled, monitored, and connected to demand.
There is also a timing dimension. Because allocation rewards buyers who reserve capacity ahead of need, the people who act while the shortage is still widening tend to secure a position that latecomers cannot buy at any price for a stretch. Waiting is itself a decision, and in a tight market it carries a cost that is easy to underestimate.
How ownership fits into a scarce market
One response to scarcity is to own the hardware itself rather than rent time on someone else's. Owning a physical GPU machine means holding the scarce asset, while a professional operator handles the demanding parts of running it. This is the idea behind managed GPU ownership: you hold the hardware, and an operations team runs the power, cooling, monitoring, and provider access inside an American data center.
The appeal of that split is practical. Sourcing scarce hardware, securing power, and running it reliably are demanding full-time disciplines, and very few individuals can do all of them well alone. Letting a dedicated team handle operations lets an owner hold the scarce asset without trying to become a data center operator overnight.
If you want to understand that model in detail, our service on managed GPU compute explains how sourcing and operations work together. None of this removes risk. Owning hardware does not guarantee any outcome. Operational benefits are not guaranteed and depend on utilization, uptime, demand, costs, hardware performance, and market conditions.
References and data
- Training compute of frontier AI models grows by 4 to 5x per year. Epoch AI. May 2024.
- Energy and AI. International Energy Agency (IEA). April 2025.
- 2024 United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory. December 2024.
- NVIDIA Blackwell Platform Arrives to Power a New Era of Computing. NVIDIA Newsroom. March 2024.
Common questions about the GPU shortage
AI compute demand has grown faster than the supply of advanced GPUs, which depend on a small number of fabrication plants, specialized memory, and advanced packaging. Power and data center capacity are also limited, so even available chips need somewhere to run before they become useful compute.
Supply is expanding, but demand keeps rising too. The IEA projects data centre electricity demand to more than double by 2030, and Epoch AI reports training compute for frontier models has grown 4 to 5 times per year, so pressure on hardware is likely to continue for some time.
Both. Chips remain hard to produce, but power and cooling are now hard limits as well. Lawrence Berkeley National Laboratory projects U.S. data center electricity could reach 325 to 580 TWh by 2028, and connecting that much new load to the grid takes years.
When supply is tight, manufacturers favor customers who commit early, buy in volume, and operate hardware reliably. Pre-orders and long-term agreements let those buyers reserve scarce units in advance, which leaves less for smaller or one-off purchasers.
It helps but does not end it. Epoch AI reports algorithmic efficiency improving about 3 times per year, yet demand for capability grows faster. Efficiency lets models do more with the same chips rather than reducing total demand.
Options include renting cloud GPU time or owning physical GPU hardware that a professional team operates. Owning means holding the scarce asset itself. Outcomes are never guaranteed and depend on utilization, demand, costs, and market conditions.
Want a real position in scarce AI compute?
Talk through what owning managed NVIDIA GPU hardware would look like, with no pressure and straight answers.
Operational benefits are not guaranteed and depend on utilization, uptime, demand, costs, hardware performance, and market conditions.