Register and share your invite link to earn from video plays and referrals.

AIไบงไธšๆŒ–ๆŽ˜๐Ÿ”
@QihongF44102
AI่€็‰›้ฉฌ๏ผŒ็ปๅކ่ฟ‡ๅคš่ฝฎAIๅ˜้ฉๅ‘จๆœŸ
309 Following    10.5K Followers
PREDICTION: Anthropic will surpass Alphabet in revenue by mid-2028. This is not a bull case or an acceleration scenario โ€” it is a continuation of the curve already in evidence. Anthropicโ€™s ARR went from $1B (Jan 2025) to $9B (Dec 2025) to $30B (Apr 2026) โ€” a 3.3x step in a single four-month window, and the curve has been steepening, not flattening. My projection actually assumes deceleration from here: $100B by end of 2026, $340B in 2027, $850B in 2028, $1.4T in 2029, $2T by 2030. Crossover with Alphabet happens at ~$575B in mid-2028, not because Anthropic accelerates beyond todayโ€™s pace, but because Alphabet โ€” locked at ~15% YoY in a mature ads-and-cloud business โ€” cannot match enterprise AIโ€™s adoption physics. As @rodriscoll intelligently observed recently, Gemini tokens served grew by only 60% in the last quarter โ€ฆ while Anthropic grew by 10X. Three drivers make the continuation structural, not speculative: customers spending >$1M/year with Anthropic doubled from 500 to 1,000 in under two months post-Series G (these are multi-year expanding contracts with near-zero churn โ€” switching a deployed agent stack mid-flight is operationally untenable); Claude Code is the wedge, not the product, dragging the rest of the platform โ€” agents, MCP, healthcare, biotech โ€” into every Fortune 2000 deployment as an attach point; and compute supply is finally non-binding with the 3.5GW Google + Broadcom deal (2027+), this weeks SpaceX partnership, and 1GW of standing Google capacity for 2026. For most of 2024โ€“2025 the bottleneck was supply, not demand. That constraint is releasing exactly when the demand curve is steepest. The standard objection โ€” โ€œno company has ever sustained this at scaleโ€ โ€” applies a software-era frame to a labor-era business. AWS, Azure, and Meta decelerated at $50โ€“100B because they sold tools to the economy. Anthropic is selling cognitive capacity into the economy. The TAM isnโ€™t enterprise software ($800B). Itโ€™s labor ($50T+). When the denominator is two orders of magnitude larger, โ€œdeceleration at $100B ARRโ€ stops being a law and starts being an assumption. The crossover isnโ€™t a maybe. Itโ€™s a function of timing. Mid-2028 is when I think Anthropic surpasses Google.
Show more
0
180
717
107
Forward to community
$MU $DRAM $SNDK Let me break this Morgan Stanley's chart precisely. The blue bar is what the industry can produce by 2027 at current trajectory. The tan/beige bar on top is the incremental DRAM demand that Agentic AI specifically adds by 2030 on top of that 2027 supply baseline. Lower bound: Agentic AI adds 26% more DRAM demand on top of 2027 supply. Meaning even in the conservative scenario the industry needs to produce roughly a quarter more than everything it can make by 2027 just to meet the additional Agentic AI driven demand by 2030. Upper bound: AI adds 77% more DRAM demand on top of 2027 supply. Meaning in the bull case the industry needs to produce nearly double what it can make by 2027 just to meet AI demand by 2030. Mind the mid-point: ~52%. That's huge in just a few years. This is incremental demand on top of everything already projected. Morgan Stanley is not replacing the existing DRAM demand forecast for 2030. They are saying Agentic AI creates an additional 15 to 45 exabytes of demand that did not exist in prior models. Extraordinary. Why? The industry cannot simply flip a switch and produce 77% more DRAM by 2030. HBM alone consumes 3 to 4 times more wafer space than standard DRAM per gigabyte. Supply takes years to build. Fabs take 3 to 5 years to come online. Morgan Stanley is essentially saying it is a structural gap that widens every year through 2030. 2031 to 2040 is the decade that matters most to me. The race for Artificial General Intelligence. Think about what AGI actually means. Not a smarter chatbot. Not a faster agent. A system that thinks the way humans think. Learns the way humans learn. Remembers the way the best human minds remember. A true AGI does not forget your conversation from yesterday. It does not lose context from a week ago. It builds on every interaction. Every piece of information. Every relationship between ideas. Continuously. Permanently. That is near infinite context. And near infinite context requires near infinite memory. The 15 to 45 exabyte DRAM demand Morgan Stanley projects for 2030 is the appetizer. AGI is the main course.
Show more
0
41
747
112
Forward to community
This post on $MU Micron had 90k views on Sanjay Mehtotra talk at a smallish conference in Silicon Valley that I attended just last Friday. That should have been telling. In $MU since $330 and I donโ€™t have enough.
Show more
OpenAI really cooked with Codex and GPT 5.5. @openai/codex going from 5.7M to 163M weekly npm downloads in one week is absolutely insane. Anthropic is cooked.
On Homebrew (a secondary source), Codex is being installed on macOS at 1.77ร— the rate of Claude Code right now. 836 installs/day vs 473 installs/day, observed this morning.
AI Semiconductor Endgame 2026 (Part 1) New Token Economics Computing Paradigm Shifts from GPU Compute to HBM This article starts from the essence of GPU architectural evolution to address a question the market has long worried about: Why must each GPU's HBM memory demand grow exponentially, and why won't this exponential growth in HBM demand stall? It then derives the first principle of token economics under the current architecture: token throughput = HBM size ร— HBM BW (bandwidth) It also discusses why the GPU ceiling is determined by HBM's two dimensions of progress. The topic of HBM cyclicality has long been controversial. Optimists argue that AI-driven demand is much greater than before, but the market mainstream still believes that previous up-cycles also saw 20%+ annual demand growth โ€” so what's different this time? AI doesn't change the fact that HBM, like traditional DRAM, has commodity attributes. Once capacity expansion at the demand peak meets a downturn, history will repeat itself. We can take the perspective of compute-chip architecture, start from first principles, and unpack and reason through this question: why this time is genuinely different. โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” History: The Era of CPU Compute For a very long time, we lived in the era of CPU-dominated compute. The CPU's top-level KPI was performance โ€” running faster โ€” and so each generation of CPUs deployed every method imaginable to push benchmark scores higher. First it was rising clock frequencies, then it was architectural evolution: superscalar designs, and so on. During this period, why didn't DDR need to advance technologically at high speed? DDR3 to DDR5 took a full 15 years. Because in this era, DDR's role was purely auxiliary โ€” and only weakly so. By industry experience, even doubling DDR speed would generally only raise CPU performance by less than 20%. Why did improvements in DDR bandwidth and speed matter so little? Two reasons: 1. CPUs designed all kinds of architectural tricks to hide DDR latency โ€” superscalar designs, wider issue widths, massive ROBs and register renaming to extract parallelism and hide latency, L1 caches, L2 caches โ€” all of which weakened the demand for DDR bandwidth and speed. 2. CPU workloads don't have particularly demanding bandwidth requirements. For most everyday workloads โ€” say, opening a webpage โ€” DDR bandwidth is severely overprovisioned. Even cloud workloads often look the same. In other words, in the CPU era, DDR bandwidth and speed didn't really matter. There was virtually no difference between DDR4 and DDR5 except in a handful of games โ€” and even the JEDEC standard advanced slowly. On top of that, only a small portion of any given app needs to permanently sit in DDR. Whatever is needed can be paged in from the hard drive on demand. App size grew slowly, and so DDR capacity demand grew slowly as well. That's why, over the past decade, the average PC went from 7โ€“8GB of DDR to about 23GB โ€” only 3ร— growth in ten years. This slow upgrade pace directly affected revenue. Capacity-based pricing was the main way of making money; speed improvements were just a technological upgrade that raised the unit price of capacity. With both of these dimensions advancing slowly, growth could only come from increases in PC/phone unit volumes. So along both dimensions โ€” bandwidth/speed and capacity โ€” DRAM was always a โ€œnice-to-haveโ€ appendage to the chip industry. The marginal utility of DDR upgrades was very low, and almost completely disconnected from the CPU era's top-level KPI. โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” The Paradigm Shift: GenAI's Top-Level KPI When we entered the era of GenAI large models, the computing paradigm shifted, and the top-level KPI changed fundamentally. By the time GPUs evolved into AI inference engines, the top-level KPI was no longer compute alone (TOPS/FLOPS), as it had been for CPUs โ€” it became the cost of a token. Specifically: overall token throughput per unit cost / per unit power. A close second is token throughput speed โ€” because in the agent era, many tasks have become serial, and token output speed has become a critical bottleneck for user experience. This is exactly why Jensen invented the concept of the AI factory: to produce the most tokens at the lowest cost, while pushing token throughput speed as high as possible. In the AI training era, Jensen's economics were TCO (Total Cost of Ownership): the more GPUs you buy, the more you save. In the inference era, Jensen's token economics flip the logic: AI inference has very healthy gross margins, so the logic now becomes: the NVIDIA GPU is the GPU that produces the cheapest token in the world, so the more you buy, the more you earn. The top-level KPI has become a Pareto frontier: along the two dimensions of token throughput and token speed, optimize as far as possible. Each generation of NVIDIA's token factory is essentially pushing the entire Pareto frontier up and to the right. This is the most important KPI of the AI inference era. โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” From Token Throughput to HBM: The Core Logic Chain Below is the most important logical chain of this article: how to start from the exponential growth of token throughput and derive that the ceiling bottleneck lies in the exponential growth of HBM size and HBM speed. In the era of single-GPU inference with single-thread batch size = 1, token throughput had only one dimension: HBM bandwidth speed. Higher bandwidth = higher token throughput. But once we entered the NVL72 era, inference is no longer single-GPU. It is a system-level token factory composed of 72 GPUs + 36 CPUs, designed to fully saturate HBM bandwidth and compute simultaneously, in pursuit of the ultimate token throughput. Token throughput growth depends on two things: the number of requests batched simultaneously ร— the average token speed per request. That is: batch size ร— token speed. Take Rubin NVL72 as an example. At an average token speed of 100 tokens/s, processing 1,920 simultaneous requests yields a token throughput of 192,000 tokens/s. A Rubin NVL72 draws roughly 120kW (0.12MW), so per MW it can handle 1.6M tokens/s. So we need to find ways to push both parameters up: batch size and average token speed. Their product is our top-level KPI โ€” token throughput. Parameter 1: Batch growth โ€” bottleneck is HBM size Every request in the batch carries its own KV cache, which has to live in HBM, with sizes ranging from a few GB to tens of GB. Because hot KV cache must be read at high frequency and high speed at any moment, it must reside in HBM. For a model with, say, 80 layers, every token generation step requires reading the KV cache 80 times from HBM. As batch size grows, hot KV cache grows linearly. And because the hot KV cache for every request in the batch must sit in HBM, HBM size must grow linearly with batch size. Like an airport shuttle bus: the gate wants to move passengers to the plane as fast as possible. If HBM size is small, the shuttle is small, so you have to make extra trips. Conclusion: batch size growth bottlenecks on HBM size growth. Parameter 2: Average token speed per request โ€” bottleneck is HBM bandwidth The decode-phase speed of a large model bottlenecks on HBM bandwidth, because every token generated requires reading the activated weights and KV cache many times over. The emergence of LPUs has, in cases where batch size isn't very large, moved the activated weights portion onto SRAM โ€” but every generated token still requires many reads of the KV cache from HBM. The higher the HBM bandwidth, the faster each token is generated, in essentially linear correspondence. Like the airport shuttle bus: HBM bandwidth is like the width of the door โ€” wider doors mean passengers board faster. The rest of the GPU's configuration is essentially adapted to support batch growth and to keep token compute speed in step with HBM growth. In some cases the GPU even spends excess compute to recover effective bandwidth (e.g., bandwidth compression techniques). โ€”------- To return to the shuttle bus analogy: โ€ข Shuttle bus cabin size = HBM Size (capacity): determines how many passengers can fit at once (i.e., how many requests' KV caches can sit in HBM simultaneously). Bigger cabin = more passengers (higher batch size) per trip. If the bus is too small, moving 100 people takes two trips โ€” and total throughput suffers. โ€ข Shuttle bus door width = HBM Bandwidth: determines how fast passengers get on and off. A wide door, and everyone piles on at once (decode/token generation is fast). A narrow door, and even with a giant cabin, people queue up and most of the time is spent boarding. โ€ข Passenger throughput = cabin size ร— door-width-determined boarding speed. โ€”------- At this point, we've logically derived the first principle of token-economics hardware demand: Token throughput = HBM size ร— HBM Bandwidth The top-level KPI of the AI inference era is highly dependent on progress along both HBM dimensions. If we want to maintain 2ร— token throughput growth per generation, that means each generation of single GPU must grow HBM size ร— HBM BW speed by 2ร—! This is the first time in history that HBM memory size can influence the top-level KPI โ€” token throughput. To validate this thesis, we can put NVIDIA's token throughput from A100 to Rubin Ultra on the same chart as HBM size ร— HBM BW speed. What you find is that the two curves track each other startlingly closely on log axes. HBM size ร— speed actually grows even faster than token throughput โ€” which makes sense, because HBM defines the ceiling, and in practice utilization of that ceiling is very hard to push to 100%. Even if HBM size ร— HBM speed grew by 1,000ร—, with the supporting compute and architecture, it would be very hard to wring out the full 1,000ร— of headroom. This curve isn't a coincidence โ€” it's the necessary solution of system optimization. throughput = batch ร— speed. This is the unavoidable first principle of token factory economics. โ€”------- What about software? Won't software optimization reduce bandwidth demand? Reduce HBM demand? This is an independent dimension from hardware. It's like asking: if software on a CPU runs faster after optimization, does that mean the CPU doesn't need to advance for ten years? After all, software is faster now. If that were the case, would CPU vendors still make money? For a CPU vendor to survive, there's only one path: in standardized benchmarks, ignoring software optimization, every new CPU generation must score higher โ€” otherwise it doesn't sell. GPUs are exactly the same. How well software is optimized, and the requirement that the GPU's own token-throughput KPI must improve dramatically every year, are two separate things. As long as token demand keeps growing, the pursuit of higher token throughput will not stop โ€” and so neither will the pursuit of higher HBM size ร— HBM speed. If HBM size and HBM speed were to slow down, Jensen would personally fly to the Big Three and pressure them to accelerate, because that ishis GPU ceiling. If the ceiling stops rising, can his GPU still sell? Of course, NVIDIA also needs to wrack its brains to extract performance beyond the HBM ceiling through heterogeneous architectural angles. The LPU is a great example โ€” it improved the Pareto frontier substantially from a different angle (the right-hand high-token-speed portion). โ€”-------------------- HBM memory has now bid farewell to that old era of drifting with the tide. On this one-way road paved by exponential demand, it has, in something close to a destined fashion, walked onto the central stage of the industry's epic. When the inference paradigm's first principles evolve to this point, as long as Jensen still wants to sell GPUs, HBM must double โ€” and it must double every generation. This is endogenous pressure from the supply side. It has nothing to do with AI demand, nothing to do with macro cycles, and nothing to do with the moods of the hyperscalers. The only remaining question is this: When demand has been physically locked into exponential growth, will the three players on the supply side โ€” like they have for the past thirty years โ€” once again drag themselves back into the mire of the cycle by their own hands?
Show more
0
24
829
135
Forward to community
The Underpriced Truth: Agentic AI Is a Paradigm Shift Centered on Memory 1/ The market will slowly realize: Agentic AI is memory-centric, not compute-centric. The new hardware stack is: โ‘  Memory โ€” HBM / DRAM / NAND โ‘ก Parallel compute โ€” GPU / ASIC โ‘ข Coordinator โ€” CPU CPUs stopped doing the heavy lifting a long time ago. This isn't a cycle. It's a paradigm. ๐Ÿงต๐Ÿ‘‡ 2/ First principles Humanity's ultimate pursuit of intelligence has always been two things: Infinite memory + infinite compute. When we say someone is smart, we mean two things: "good memory" + "fast thinking." Machine intelligence is walking the exact same path. 3/ The story the market already understands: HBM LLM inference's decode stage is a textbook memory-bound workload. Every token generated โ†’ drag the entire KV cache across memory. Bandwidth too low โ†’ expensive GPUs sit idle. That's why every new GPU generation ships with more HBM bandwidth and capacity. 4/ The story the market is missing The "1M context" you keep hearing about? It is not assembled inside the GPU inference cluster. So where is it actually built? 5/ It's built on the traditional servers running the agentic system Those CPU + huge-DRAM servers are quietly doing the heaviest lifting: โ€ข loading user long-term & short-term memory โ€ข loading the agent's system spec / prompt โ€ข loading skill / tool / subagent definitions โ€ข compressing the context once it overflows 1M tokens All of this lives in DRAM, not HBM. 6/ Compare this to the previous era In the web / mobile era, we barely stored any user context at all. Only search / recsys / ads kept a small user profile โ€” maybe 1/20, even 1/100 of the data volume an agentic system needs today. That asymmetry is the real overlooked inflection point. 7/ The supply chain is already telling this story Server CPU : DRAM ratio is climbing fast: โ€ข Web / Mobile era: 1 core : 4 GB โ€ข Agentic AI today: 1 core : 16 GB โ€ข Deep agentic future: 1 core : 64 GB and beyond 8/ And it's NOT just "4x more memory" Under agentic workloads, a single CPU serves a fraction of the users it used to. When the entire IT stack migrates to agentic: โ€ข CPU count grows several-fold to ~10x โ€ข DRAM total grows tens-fold to ~100x That's the part nobody is pricing in. 9/ The conclusion Agentic AI is a paradigm shift centered on storage + parallel compute. The software paradigm changed. The hardware paradigm changed with it. Only those who deeply understand the technology will see it: This isn't a memory cycle. It's a memory paradigm. 10/ Time horizon Given how early we still are on: โ€ข user adoption rate โ€ข depth of usage per user We are at least 5 years away from the cyclical top of this memory wave. (Zoom out far enough and everything is a cycle โ€” but this one is nowhere near peak.) $MU $DRAM $SNDK
Show more
Hana Securitiesโ€™ bull case estimates that next year, NVIDIA alone will account for 72% of total LPDDR supply. Did you hear me, anon?
It is true
Semi Analysis Dylan Patel: People are like, 'oh, the memory story is overplayed, everyone gets it.' No, no, no โ€” you don't get it. DRAM will double or triple from here still, because that's how much capacity is required, and they have to steal capacity from somewhere else. And the only way to steal capacity in a capitalist economy is demand destruction via higher pricing.
Show more