Yiran(@yiran2037840 )

Yiran@yiran2037840

2026.05.06 16:37

看到最近用户都被OpenAI抢走，阿迪王终于急了🤣，租下马斯克闲置的算力，开始学习OpenAI双倍用量，好事！前几天还看到有帖子说xAI的算力只有11%在用（闲置了90%）这不今天xAI就摇身一变，也变成算力租赁了🤣 所以这样，中转站一下利润就能翻倍？🤔

Wall St Engine@wallstengine

2026.05.06 16:21

Anthropic: Claude Code is getting higher limits. The company is doubling Claude Code’s 5-hour rate limits, removing peak-hour limit reductions, and raising API rate limits for Claude Opus models, giving heavier users more room before hitting caps.

轉發到社區

Yiran@yiran2037840

2026.05.05 02:11

NVIDIA GB300 Ultra NVL72在vLLM推理引擎上的真实端到端性能达到了GB200 NVL72的2.7倍，远超硬件规格上纸面约1.5倍的FLOP和内存提升。这种提升，本质上来自全栈AI infra的复合优化。在实际AI服务负载（throughput vs interactivity的中段曲线）下，产生1+1>2的放大效应。

SemiAnalysis@SemiAnalysis_

2026.05.04 21:00

MINECRAFT STEVE ALERT: GB300 ultra NVL72 is already 2.7x faster 🚀 than GB200 NVL72 on one of the industry standard inference engine known as @vllm_project. On paper, GB300 only has ~1.5x faster NVFP4 FLOP & 1.5x more HBM capacity & same HBM BW than GB200 but due to the full stack optimization with compounding gains, in the middle of the curve where most providers serve at, GB300 is up to 2.7x faster. End to End performance is the gold standard of performance, not on paper theoretical flops. Thanks to the 10x engineers at NVIDIA & @inferact & @coreweave for this temporary gb300 for open source projects!

轉發到社區

Yiran@yiran2037840

2026.05.04 14:22

今日neocloud们继续大涨，尤其是NBIS，新云中我最看好的标的。但是中长期仍然有局限性，例如负债率、折旧率。我认为目前偏向短期炒作，一定要注意风险，不要FOMO买入。关于新云的中长期局限性，可以参考fin哥的帖子，十分完整的视角。

Yiran@yiran2037840

2026.05.03 05:42

轉發到社區

Yiran@yiran2037840

2026.05.03 05:42

今天想系统聊一下 AI Infra，尤其是推理优化，为什么在 2026 年这个时间点变得如此重要。我自己做 AI Infra，越在这个领域里做，就越能感受到一件事：AI 的竞争早就不只是模型参数、benchmark 和产品体验的竞争，本质上已经变成了一场AI Infra竞争。

Yiran@yiran2037840

2026.05.03 05:42

轉發到社區

Yiran@yiran2037840

2026.05.02 09:31

其实我觉得OpenAI这个战略是对的。因为实际上大部分用户（99%）都不是程序员，不会写代码。 99%的用户对AI的感知就是用来干点日常的事情的，Codex这波通用Agent能力大幅提升，实际上正是挖掘了非专业用户群体的市场份额。 AI任务做的多是严重缺token的，但同样价位套餐下，OpenAI的额度和服务质量远远比Anthropic慷慨且稳定，这种"Token安心感"和"Token稳定性"足够让大部分人转向Codex了。

Yiran@yiran2037840

2026.05.02 09:16

一个小想法，最近codex app的口碑好评如潮，可能会催生更多的100刀 or 200刀的chatgpt pro套餐的开通。刚刚也从推友那看到了gpt-5.5 api销售超过claude的消息。（未经证实，谨慎看待）当人在体验过那份“把你吓晕在凳子上瘫坐”的力量以后就回不去了。今天刚给不少朋友推荐codex app，用完之后普遍反馈是太好用了，但是token不够用，要急着充100 or 200刀的套餐🤣

轉發到社區

Yiran@yiran2037840

2026.05.02 09:16

轉發到社區

Yiran@yiran2037840

2026.05.02 04:42

刚刚，一位博主对最近很火的“估算GPT-5.5和Opus-4.7参数量”的帖子提出了数值方面的质疑，该博主的结论为: GPT-5.5 ~1.5T, Opus-4.7 ~ 1.1T。值得注意的是该博主对“不可压缩探针”方法是认可的，仅修正了数值。个人看法：即使数值下修，GPT-5.5参数量仍然远大于Opus的参数量。在此基础上，OpenAI仍然能稳定服务几乎所有C端用户。个人觉得Anthropic预留给普通C端用户（非Mythos模型）的算力远小于OpenAI预留给GPT-5.5的算力。此外，我个人觉得OpenAI极有可能在“推理Infra”和“与Cerebras的合作”上有了显著的突破，从而实现高效复用KV Cache与算力资源，达到快速且稳定的服务质量。

Lawrence Chan@justanotherlaw

2026.05.02 02:17

A recent viral paper claims to reverse-engineer the parameter counts of frontier models: GPT-5.5 = 9.7T, Opus 4.7 = 4.0T, o1 = 3.5T, etc. @ben_sturgeon and I investigated and found serious issues in the paper; fixing them gives GPT-5.5 as ~1.5T (90% CI: 256B-8.3T).