ironically think itโll be a sad time for ai researchers this year. they are first in the hotpath of RSI and probably the market for them will shrink or at least their pricing power will be reduced as this generation of models commoditizes the skills that made them rare
We've actually gone farther than this. Nemotron 3 Super (120B-12A) was pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra was also pretrained in NVFP4.
This research paper advances the state of NVFP4 pretraining but it is not just research, we are using NVFP4 for our most important pretraining work.
Over the last week, we had to say goodbye to the little orange menace we affectionately refered to as "The Boy".
Hug your pets just a little tighter - they're too good for us.
We've gone even farther:
Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4.
Nemotron 3 Ultra is ~500B and also pretrained in NVFP4.
Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.
amazing post and great timing w.r.t. ant's post yesterday
we must build open ai to not get locked in by the vendors who will decide who gets which capabilities
and the west has to realize that open models are important and support open model efforts (like @arcee_ai, @NVIDIAAI)
Imagine the generational aura loss if you said:
"China is so cracked compared to the US that if we had a fair compute playing field we would definitely lose for sure despite having a literal massive headstart and infrastructure/supply chain advantage."
suuuuper excited to be collaborating with the excellent LangChain Labs team on this effort
prod agent tracing is the seed that lets you close the loop for continual learning. too much data gets collected but not used for learning. time to change that :)
What is a claw? ๐ฆ
It's the shift from AI that suggests โ AI that acts.
Autonomous agents that run 24/7, handling complex work in the background so you don't have to.
Time to yap on some smol MoEโs today. If youโre around AI council, my talk is at 10!
Followed by the ๐โs of @latkins, @ezi_ozoani, @llm_wizard, and @samsja19
Everything from pretraining at home to large scale RL
told you to not sleep on MiMo.
what they have accomplished in such a short span is remarkable, their first (7B dense) llm was released exactly a year ago
We are releasing Star Elastic - turn ONE reasoning LLM into MANY sizes with a single post-training run.
360ร cheaper than pretraining a family of models.
7ร better than SOTA compression.
Split reasoning capability.
Plus elastic budget control that beats the accuracy-latency frontier.
Paper:
HF models:
Thread ๐