Article on GPU supply
Why GPUs are hard to get
Advanced GPUs are not like ordinary products you can simply make more of. Here is what the supply chain actually looks like, and why it stays tight.
Key takeaways
- Advanced GPUs depend on a handful of fabrication plants, specialized memory, and advanced packaging that cannot scale overnight.
- Training compute for frontier AI models has grown roughly 4 to 5 times per year since 2010, according to Epoch AI, far faster than supply can expand.
- Each step in the chain has its own bottleneck, so fixing one does not fix the whole shortage.
- Scarcity tends to favor large buyers who can commit early and at scale.
Why you cannot just make more GPUs
When most products run short, factories add shifts or build another line. Advanced GPUs do not work that way. They sit at the end of one of the most complex supply chains ever built, where a single chip passes through dozens of specialized steps that took years and billions of dollars to develop.
That complexity is the core reason GPUs are hard to get. Even when demand is obvious and buyers are willing to pay almost anything, the chain can only move as fast as its slowest, most specialized link. Money cannot conjure a new fabrication plant in a quarter, and skilled capacity cannot be hired into existence overnight.
It also means the shortage is not a glitch that a single decision can fix. It is a structural feature of how the most advanced computing hardware in the world is made.
Three bottlenecks that keep GPUs scarce
Fabrication
The most advanced chips are made at a small number of leading-edge fabs. Building a new fab takes years and enormous capital, so capacity cannot expand on short notice.
Memory
AI accelerators rely on high-bandwidth memory stacked beside the chip. This specialized memory has its own limited supply and its own production constraints.
Advanced packaging
Combining the chip and memory into one package uses advanced packaging steps that are themselves in short supply, creating a bottleneck even when chips and memory exist.
How one accelerator comes together
Following a single accelerator from start to finish shows why each stage is a possible choke point. Raw silicon wafers are patterned at a leading-edge fab in a process that can take months. Separately, high-bandwidth memory is manufactured and tested by a small group of suppliers. Only specialized facilities can then bond the logic chip and the memory stacks into one tightly integrated package.
From there the package is mounted on a board, combined with power delivery and cooling hardware, tested, and finally assembled into a server. A delay or quality issue at any step holds up everything downstream. Because the same few suppliers feed the entire industry, a constraint at one of them ripples out to every buyer at once.
The capital and expertise behind each stage are part of why new entrants cannot simply appear. Leading-edge fabrication tools cost a fortune and take years to install and tune. The engineering knowledge to run them at high yield is rare and closely held. So even with strong demand and willing investors, the barrier to adding a genuinely new source of supply is enormous, which keeps the existing bottlenecks firmly in place.
From a single chip to a full cluster
A finished accelerator is just the beginning. To do useful work it must be combined into servers, racked, networked, powered, and cooled into a cluster like this one. Every stage adds its own lead time, which is why the gap between ordering chips and running compute can stretch into many months.
How fast demand is outrunning supply
Demand compounds faster than the chain can grow
Epoch AI finds that the compute used to train frontier AI models has grown roughly 4 to 5 times per year since 2010. A supply chain that needs years to add capacity simply cannot keep pace with demand that multiplies annually. The two are running on completely different clocks.
Better software helps a little. Epoch AI also reports that algorithmic efficiency improves about 3 times per year, meaning models do more with the same hardware. Even so, that gain has not closed the gap, because appetite for capability grows faster than efficiency saves. When a model becomes cheaper to run, buyers tend to run larger or more numerous models rather than stopping at the old level.
Why the chain is so concentrated
Part of what makes GPUs hard to get is how few places can perform each critical step. Leading-edge fabrication is dominated by a small number of facilities, advanced memory by a handful of suppliers, and advanced packaging by an even narrower set of lines. There is little redundancy, so there is little slack when demand spikes.
This concentration has consequences beyond price. It makes the whole chain sensitive to disruption, whether from a natural disaster, an equipment shortage, or a policy change in one region. A single constrained step can hold back the entire industry at once, which is why supply feels fragile even when factories are running flat out.
It also means capacity cannot be added in small increments. The facilities involved are among the most expensive and complex ever built, so expansion happens in large, slow, capital-heavy steps rather than gradual adjustments. That is the structural reason the shortage resists a quick fix.
What scarcity means for getting access
When a resource is this hard to make, it flows first to buyers who can commit early, buy at scale, and operate the hardware properly. For everyone else, the practical question is how to hold a real position in GPU compute without trying to win a supply race alone against the largest companies in the world.
One answer is managed ownership, where you own physical GPU hardware that a professional team sources, hosts, and operates. The value of working through an operator is partly about navigating exactly the supply chain described above. Procurement relationships, volume, and timing all matter when hardware is allocated rather than simply sold, and those are hard for an individual to assemble alone.
Our service on managed GPU compute explains how that works. It does not remove risk from the picture. Owning hardware does not guarantee any result. Operational benefits are not guaranteed and depend on utilization, uptime, demand, costs, hardware performance, and market conditions.
References and data
- Training compute of frontier AI models grows by 4 to 5x per year. Epoch AI. May 2024.
Common questions about GPU supply
Advanced GPUs depend on a small number of leading-edge fabs, specialized high-bandwidth memory, and advanced packaging. Each step took years and huge investment to build, so capacity cannot expand quickly even when demand is clear.
There is no single bottleneck. Fabrication, memory, and advanced packaging each have their own limits, so easing one does not solve the whole shortage. The chain moves only as fast as its slowest specialized link.
Beyond making the chip, the package must be built into servers, racked, networked, powered, and cooled before it does useful work. Each stage adds lead time, so the gap between ordering and running compute can stretch into many months.
The same few suppliers feed the entire industry, especially for memory and advanced packaging. A constraint at one of them ripples out to every buyer at once, which is why the shortage feels broad rather than isolated.
Supply is expanding, but demand keeps rising fast. Epoch AI reports training compute has grown 4 to 5 times per year since 2010, far quicker than the chain can add capacity, so tightness is likely to persist.
Working with an operator that sources hardware professionally and lets you own it is one route. In the managed ownership model, the operator handles procurement and running the hardware while you hold the physical machine, though outcomes are never guaranteed.
Want a real position in scarce GPU hardware?
Talk through what owning managed NVIDIA GPU hardware would look like, sourced and operated by professionals.
Operational benefits are not guaranteed and depend on utilization, uptime, demand, costs, hardware performance, and market conditions.