Breaking the "Memory Wall": Optical Interconnects Emerge in GPU–HBM Packaging
As a solution to the "memory wall," one of the chronic challenges in AI semiconductors, the memory and packaging industries at home and abroad are weighing an approach that decouples the GPU and high-bandwidth memory (HBM) and packages them separately. The core idea is to move the HBM—until now mounted right next to the GPU—a certain distance away, and bridge the gap with light (optics), allowing several times more HBM to be installed than is possible today.
On the 22nd, a researcher at a major domestic memory maker said, "We're currently struggling to expand HBM bandwidth and capacity, so we're discussing with customers a plan to overcome the GPU's shoreline limit through optical interconnects and mount more HBM." Shoreline refers to the length of the chip's perimeter.
In today's AI computing environment, the key factor dragging down compute efficiency is the data transfer speed of memory chips. While GPU performance has grown by leaps and bounds with each generation, the speed at which memory stores and supplies data has failed to keep pace—creating a structural performance barrier, the memory wall. The arrival of HBM, with its wide data pathways, put out the immediate fire, but critics continue to point out that bandwidth and transfer speeds still fall short of handling the explosive growth in AI compute.
Until now, the industry has focused on stacking HBM ever higher to increase memory capacity and bandwidth within a confined footprint. But as stack counts climbed past 12 and 16 layers toward 20 and beyond, process difficulty rose exponentially. The technology hit physical limits, including the growing difficulty of meeting fixed height specifications. Vertical stacking has reached an inflection point—so much so that the JEDEC standards body has relaxed its HBM height specifications.
The bigger problem is that if stack counts can't be raised, the alternative is to add more HBM horizontally around the GPU—but that, too, is impossible. In the current 2.5D packaging structure, the GPU and HBM are mounted tightly together on a single substrate. Within this structure, the number of HBM units that can be placed is strictly limited by the finite length of the GPU chip's perimeter—its shoreline. Even when more HBM is desired, there is physically no room to place it, leaving the industry in a structural deadlock.
The alternative now emerging across the semiconductor industry is to separate the GPU and HBM and package them independently. It overturns the conventional chip-design principle that components must sit close together to minimize data transfer time. Instead of keeping the two chips adjacent, the approach spaces them apart and links them with overwhelmingly fast optical signals to overcome the added physical distance.
Placing the HBM slightly away from the GPU within the board frees the design from the GPU's shoreline constraint. With the spatial limitation gone, far more HBM can be spread out laterally and packed into the board—several times more than today—without having to push stack heights to extremes. This means the total memory capacity and data bandwidth of the AI accelerator system would expand dramatically, on a scale incomparable to current systems.
"Discussing Placing HBM Beneath the GPU"… Form Factor Could Change
The industry is now producing a range of architectural design proposals over where exactly to place the HBM within the GPU board.
The same memory researcher said, "Options under discussion range from broadly utilizing the space immediately around the GPU to isolating the HBM beneath the GPU board." He added, "In the latter case—isolating it beneath the GPU board—the motherboard would have to be extended lengthwise, so we're discussing even an overall form-factor change with the GPU maker." Specifically, the HBM might surround the GPU from several centimeters away, or a separate HBM zone might be created in the center of the board.
"We're keeping every possibility open as we discuss the optimal layout," he said. "Nothing has been confirmed as an official roadmap yet, but as part of preliminary research toward next-generation AI accelerators, we're in talks with our partners."
The outsourced semiconductor assembly and test (OSAT) industry is also watching this trend closely. An executive at a global OSAT firm said, "Optical interconnects are a clear trajectory. The only question is timing," predicting that "rack-to-rack and server-to-server links will go optical first, and then chip-to-chip connections within the board will follow." He added, "The larger units will be connected by light first, but optical research is moving so fast that it may not be that far off."
Technically, the optical-interconnect technology linking GPU and HBM shares the same underlying principle as the technology connecting server to server inside a data center. The difference is the high technical barrier of shrinking optical-conversion technology—once used for communication between large pieces of equipment—down to the microscopic scale of a single board and chipset.
An executive at a domestic developer of co-packaged optics (CPO) components explained, "As HBM stack heights approach their limit, the industry is discussing spreading the memory out laterally to maximize how much can physically be mounted." He added, "The principle is the same as conventional data-center optical interconnects, but HBM optical links that have to operate within a confined board space require optical components to be miniaturized to far smaller sizes and far higher integration density—so the technical difficulty is greater."
Show more
JUST IN: OPENAI IS PREPARING POSSIBLE LEGAL ACTION AGAINST APPLE $AAPL OVER THE CHATGPT PARTNERSHIP ANNOUNCED IN JUNE 2024.
OpenAI lawyers are working with an outside firm on a range of options.
The OpenAI executive quote tells you everything, per Bloomberg:
"We have done everything from a product perspective. They have not, and worse, they haven't even made an honest effort."
What OpenAI is considering:
- A formal notice alleging breach of contract (does not require a full lawsuit at the outset)
- Action likely won't come until after the OpenAI v. Elon Musk trial concludes
What went wrong:
- OpenAI expected the deal to generate billions per year in ChatGPT subscriptions
- Apple users overwhelmingly use the standalone ChatGPT app, not the Siri integration
- Users have to specifically say "ChatGPT" to invoke OpenAI through Siri
- Apple pitched the deal as being on par with its Google Safari search arrangement (which generates tens of billions annually for both sides)
The competitive shift:
- Apple is opening its platforms to rival AI providers later this year
- Apple is testing both Anthropic's Claude and Google Gemini
- A revamped Siri with an AI model picker is slated for WWDC on June 8 (iOS 27)
- Apple is paying Google $1B annually for Gemini to revamp its own AI models
- OpenAI declined to work with Apple on the new models because it "felt burned"
Closing quote from the OpenAI executive: "Apple has so much market power that they can dictate terms. We already took this leap of faith with you, and it didn't work out well."
Show more