Battle for the AI Data Center: Deep Dive on the Semiconductor Supercycle
Stacy Rasgon (senior semiconductor analyst, Bernstein) x TechSurge podcast
Stacy Rasgon has covered semiconductors at Bernstein for 18 years, came up as an MIT-trained engineer who built chip equipment before he wrote a note, and says he has heard the word supercycle his entire career while maybe seeing one real one only now.
The conversation is less a stock pitch than a map of where the AI buildout actually binds: memory, packaging, equipment, and eventually power, and who collects the margin while each one is scarce. The numbers are large and the constraints are physical.
He has heard “supercycle” his whole career and thinks this may be the first real one.
Rasgon frames the AI episode as a demand cycle rather than the usual memory supply cycle, and says the magnitude and the speed at which it ripped are something close to unprecedented in his career. The plain version is his recurring line, that everything he hears says nobody has enough compute, and there is no sign yet of a peak.
There is more than one kind of semiconductor cycle, and naming them tells you what this is. Rasgon walks through the distinct cycle types before placing the current one, which is the demand cycle, the rare kind and the one running now at a scale he has not seen:
Supply cycle: supply gets tight, prices rise, everyone adds capacity, the demand turns out to have been over-ordered, supply floods in as demand falls, and you gap out. These are the scary four-year peak-to-trough cycles, most familiar in memory.
Inventory cycle: semiconductors sit at the back of the supply chain, so small swings in end demand propagate backward and get amplified, showing up as resets that last a few quarters as customers under-ship then over-ship to refill.
Product cycle: a vendor wins or loses a specific socket, like the chip in the next iPhone, which swings its revenue up or down regardless of the broader market.
Demand cycle: a genuine surge in end demand, the kind happening now, which he says he has never seen on this scale or at this speed.
The constraint does not sit still, it moves from one category to the next. As AI compute demand climbed, Rasgon says the industry utilized all the available supply in one category after another, each going into constraint mode in turn. The leverage in a supply-constrained boom sits wherever supply is tightest, and that spot keeps moving:
Accelerators and GPUs first
Then memory
Then semi-cap tools
Then networking and optical
Then power semis
And now even CPUs
This is probably the strongest memory cycle in history, a year and a half after the worst one since the tech bubble.
Rasgon says memory prices are roughly doubling every quarter with demand through the roof, and contrasts it with 18 months ago, when the industry sat in its worst memory cycle since the tech-bubble days. The structure is healthier now, with three to six makers per memory type instead of the 30 of two or three decades ago, so the down-cycles bring losses rather than the bankruptcies of the old days.
An AI chip is mostly memory, and HBM eats four times the silicon of normal memory.
Rasgon says that if you add up the silicon area in an AI chip package it is probably 85% or more high-bandwidth memory. There is also a trade ratio working against supply: because of stacking, lower yields, extra logic dies, and the space needed for connectors, making a gigabyte of HBM takes roughly four times the silicon area of a gigabyte of standard DRAM. The consequence is counterintuitive, that the industry can add a lot of DRAM wafer capacity and still not add many usable bits.
Moore’s Law broke, the cost leg fell off, and the industry got healthier for it.
Rasgon describes Moore’s Law as a three-legged stool of cost, performance, and power, where for decades you got twice the transistors every two years at the same cost. The cost leg went out the window more than ten years ago, so cost per transistor began rising instead of falling. His read, echoing what Broadcom’s then-CTO Henry Samueli argued around a 2012 analyst day when TSMC’s cost per transistor bottomed near the 28-nanometer node, is that this opened a renaissance, because customers now pay for improvements the industry used to give away, and packaging and chiplets carry the gains the shrink no longer does.
Hyperscaler vertical integration is real but not new, and it is happening for several reasons at once.
Rasgon pushes back on the idea that custom silicon is a fresh phenomenon, noting Google has built its own TPU with Broadcom for about 14 years and is on its eighth generation, Amazon has run its own AI chip, Trainium, and the Graviton server CPU for years, and Apple has done its own silicon for well over a decade. On why they do it, he says it is all of the above:
Bottlenecks, to avoid being fully dependent on supply they do not control
Performance, because a large stable internal workload can be optimized very tightly on a custom design
Total cost of ownership and margin, to avoid paying the full merchant price
Supplier leverage, so they are not dependent on a single sole supplier like NVIDIA
A small group of suppliers plus heavy demand equals durable pricing power.
Rasgon notes the industry passed COVID-era input-cost inflation straight through at its margin structure, so a company with a 60% gross margin facing a 100-dollar cost increase passed along 160, and not one of the companies he covers complained of margin compression. Across the cycle the effect compounds: comparing 2019 to 2024, total industry units were roughly flat while collective average selling prices ran about 50% higher.
Investors have been trading the bottleneck, and the two halves of that trade do not agree.
Rasgon observes that the compute names stagnated for a stretch while the memory, semi-cap, and optical names ripped, as money chased whichever constraint was binding. He flags the tension directly, that those two trends are not really consistent with each other, so the market probably has to normalize one way or the other at some point.
You do not get paid for training a model, you get paid for using it, so the pivot to inference has to happen.
Rasgon says a lot of the spend so far has been training, which is necessary but earns nothing on its own, while inference is where the money is. He is starting to see the pivot in the data, including CPUs and other non-GPU parts getting constrained because inference uses far more than just GPUs, and he points to agentic coding as the biggest emerging application. He adds the sober counterpoint, that some companies laying off workers to spend on AI are now spending more on tokens than they were on the employees the tokens were meant to replace.
Not all tokens are worth the same, which is why NVIDIA bought Groq.
Rasgon relays Jensen Huang’s point from GTC that a token is not a token is a token, that low-latency, fast-response tokens can be monetized at a far higher rate, and that this is a job a general-purpose GPU is not best suited for. Rasgon points to NVIDIA’s acquisition of Groq as exactly that hole being filled, and to him it validates the inference-specialist category rather than threatening it.
The host’s frame: 50 companies become three, and AI is mid-shakeout.
Marks argues that when the PC arrived there were 50 companies and now there are three, that the cell phone did the same, and that the AI revolution is running through that consolidation now. Rasgon declines to speculate on specific M&A but treats the structural read as sound, which is part of why the assets the inference startups hold get bought rather than built.
Equipment is having a strong year, but a physically constrained one.
Rasgon expects a strong year for wafer fab equipment, then qualifies it: the tools need fabs and clean rooms, the so-called shells, to go into, and those have to be built first, so this year is capped and the new clean rooms start accepting shipments next year. The big five equipment makers, ASML, Applied Materials, Lam Research, Tokyo Electron, and KLA, are collectively roughly 70% or more of total equipment spend, and their stocks have risen far less than memory, with Lam going from about 70 dollars early last year to roughly 280 by Rasgon’s recollection versus the 10x moves in memory.
He does not think this is a bubble yet, partly because the industry physically cannot ramp that fast.
Rasgon says the industry is not anywhere near crazy enough to call a bubble, and that the hard physical limit on how fast clean-room and wafer capacity can come online is itself a mitigating factor. If clean rooms were unlimited the industry would be shipping far more equipment this year, and whether that would be too much he cannot say, but the natural constraint slows the ramp either way.
Export controls keep encouraging exactly the Chinese capability they were meant to limit.
Rasgon names two primary Chinese memory players, CXMT in DRAM and YMTC in NAND, constrained by US blacklists that block US equipment sales to them. He extends the point to equipment and AI chips: limits on US sellers have let Chinese toolmakers take more share than they otherwise would, and barred from buying US AI chips, China’s local chipmakers brute-force big clusters with abundant power and little concern for energy efficiency.
Broadcom turned a billion-dollar custom-chip business into a six-figure AI line, and the GPU-versus-ASIC fight is the wrong question.
Rasgon describes Broadcom before the AI run as roughly 60% semiconductors and 40% software at high margins, with a custom-silicon business that was maybe a billion dollars a year. On its last earnings call it guided to AI revenue in excess of $100 billion next year, a figure Rasgon expects it to beat. He likes both the GPU and the custom-chip, or ASIC, sides, and reframes the debate everyone wants to have:
ASICs are probably mid-teens percent of AI chip revenue today and could plausibly become 25 to 30 percent of a much bigger pie
Custom chips win for large, stable, internally developed workloads where the volume amortizes the design cost
GPUs keep the edge on flexibility, because a workload change forces a new ASIC design
The right question is not who wins but whether the opportunity ahead is still bigger, because if it is, both thrive, and if it is not, both are in trouble
On Intel, the foundry strategy was right and the execution was the problem.
Rasgon says foundry matters for national security and supply, so Pat Gelsinger’s strategy was not wrong, but his execution was poor, including, in Rasgon’s telling, hiring 21,000 people and then having to fire them. He rates new CEO Lip-Bu Tan as the right fit, technical, an under-promiser fixing the cost structure, who by Rasgon’s account essentially founded SMIC and sat on its board for about 20 years and can call CC Wei or Jensen Huang directly. The new shareholder base of the US government, NVIDIA, and SoftBank has strengthened the balance sheet enough to take the carve-up-for-parts risk off the table, even though Rasgon dislikes the government stake.
The hardest 2030 problem may not be making the silicon, it may be powering it.
Rasgon says Jensen Huang’s once-crazy-sounding figure of 3 to 4 trillion dollars a year of AI infrastructure spend looks less of a stretch now that annual spend is getting close to a trillion, but the binding constraint shifts to power. A Bernstein model he built with an electrical-equipment colleague implied US electrical capacity would have to grow about 5% a year for a decade to support the forecast, which the power analyst called unachievable, pushing the answer toward onsite generation and restarts like Three Mile Island.
He is most encouraged that real chip startups exist again.
Rasgon says he shelved a piece on venture capital in semiconductors about ten years ago because there was almost none, with chip investing then dominated by corporate arms like Intel Capital and Qualcomm Ventures, since designing a chip cost far more time and money than spinning up a SaaS company. Regardless of how any single startup turns out, he says it is about time silicon innovation came back.
Asked what people are missing, he points past the chips to whether normal people actually use this.
Pressed on the risks the up-and-to-the-right crowd is ignoring, Rasgon’s central concern is monetization, how the spend gets paid back and how many foundational-model companies actually survive. What he most wants to see a year out is not engineers but ordinary people using AI inference in ways that add real value to their lives, because that is the demand the entire buildout is betting on.
Why he is still doing this after 18 years.
Rasgon says his rule has always been to do the job until it stops being fun and then go find something else, that he loves the industry, and that there is always something new in it. It is the same disposition that runs through the interview, an engineer reading capital flows through the physics rather than the headlines.
High-leverage quotes
And what’s really interesting, there’s a couple of different types of semiconductor cycles, right? There are supply cycles. You see these in memory a lot. Prices, supply gets tight, prices go up, you add a bunch of capacity, capacity comes online, it turns out the demand you were building the supply for was not real, because customers, when they can’t get the parts that they want, they tend to order even more. And so supply comes online, demand falls like that, you gap out. Those are your sort of scary typical four-year peak-to-trough kind of cycles. This is like a demand cycle, it seems to be. I’ve never seen one on this scale before. The magnitude and rapidity of the speed at which this has actually ripped is something that I think is somewhat unprecedented.
Stacy Rasgon, [02:26]-[04:11]. The core line is the cycle taxonomy landing on “this is like a demand cycle,” the rare kind, at a scale and speed he calls unprecedented.
If you were to think about the wafer or silicon area in an AI chip, it’s probably 85% plus HBM if you were to add up all of the chip area itself. So it uses a lot of memory. And so, not only is demand super strong, but there’s another issue. They call it a trade ratio. Because of all the stacking and everything, you have lower yields. There’s more logic dies that go into it, and you need to leave space on the dies to put the connectors so you can stack the chips together. So you need something like four times as much silicon area to make a gigabyte of high bandwidth memory as you do to make a gigabyte of standard DRAM. And so you’re in a scenario where we could in theory be adding a bunch of wafer capacity to make DRAM, but not actually adding that many bits because of this.
Stacy Rasgon, [07:57]-[08:56]. The core line is the four-times trade ratio, the physical reason adding DRAM capacity does not add proportional bits, which is the heart of the memory squeeze.
Moore’s Law started breaking down over 10 years ago. But it didn’t mean that you couldn’t continue this technology. What it really meant is the cost leg of that three-legged stool was going out the window. So we can still get performance and power improvements, but now you have to pay for it. The cost per transistor was now going up instead of down, which was a new thing, and people were very worried that would be the end of the industry. And it really wasn’t. It opened up a renaissance in the industry, because what it actually meant was now if you want these improvements, Mr. Customer, you have to pay for them, whereas historically the industry just gave it away for free.
Stacy Rasgon, [10:10]-[10:43]. The core line is that the cost leg failing turned into a renaissance, because customers now pay for gains the shrink used to deliver free.
A lot of the spending has been on training. I used to get the question, do you think the spending will ever pivot from being training-dominated to inference-dominated? And my response has always been, well, it better. Because if it doesn’t, we have a problem. Training is important, you need to do it, but you don’t earn any money by training a model. Nvidia earns money by you training a model, but you buying the chips doesn’t. You have to be able to use the model, so that is inference. You get paid for that. And it’s funny, a lot of companies are like, we’re laying off employees to spend on AI, and it’s like, well, wait a minute, we’re spending more on tokens than we were on the employees those tokens are supposed to be replacing.
Stacy Rasgon, [24:21]-[25:56]. The core line is that you get paid for inference, not training, with the token-versus-payroll aside as the live check on AI ROI.
Nvidia just bought Groq. Why did they do that? I had this dumb simple view that a token is a token is a token. And what Jensen said, which stupidly makes a lot of sense when you think about it, is they’re not all the same. Some tokens can be monetized at a far higher rate than others. In particular, the tokens that require really low latency, really fast responses, if you’re a Neo Cloud renting this capacity, you can rent out that kind of capacity at much better economics. That’s why he bought Groq. That’s not a thing that a GPU is necessarily the best suited for.
Stacy Rasgon, [27:25]-[29:08]. The core line is that low-latency tokens monetize at a higher rate and a GPU is not best for them, which is the logic behind the Groq buy.
This should be a pretty strong year for WFE. But as strong as it is, it’s a constrained year. Because if I’m selling semiconductor manufacturing equipment, I need somewhere to put it. I need a fab. I need a clean room, or they call it a shell. They don’t have the factories. They have to build the factories first. So as strong as this year is, that is a constrained year. People will probably talk about AI bubbles or not. I don’t think this is a bubble. We’re not anywhere near crazy enough to be a bubble yet. There’s in some sense a hard physical limit on how much the industry can expand. There’s a natural constraint on how quickly things can ramp because we just don’t have the semiconductor capacity to do it.
Stacy Rasgon, [31:38]-[32:51]. The core line is that the physical limit on clean-room and wafer capacity is itself the reason this is not yet a bubble.
If you had Jensen, he said at one point we were going to be spending three or four trillion dollars a year on infrastructure. And it sounded crazy at the time, although I’d say we’re probably getting close to a trillion now, so three trillion off of a trillion isn’t as much of a stretch. But if you were to ask me, by 2030, if there was actual real demand to build up 3 trillion to 4 trillion of capacity a year, do we even have the power in place to do it? Probably not. The electrical grid probably can’t handle that. I did a piece of work with a colleague who covers electrical equipment, and it spat out that US electrical capacity has to grow about 5% a year for the next decade. I looked at that and said, yeah, okay, that sounds right. And my colleague looked at me like I had two heads. In his world, 5% a year is unachievable.
Stacy Rasgon, [47:44]-[48:59]. The core line is the 5-percent-a-year grid growth his model required and the power analyst called impossible, relocating the binding constraint from silicon to power.

