We've actually gone farther than this. Nemotron 3 Super (120B-12A) was pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra was also pretrained in NVFP4.
This research paper advances the state of NVFP4 pretraining but it is not just research, we are using NVFP4 for our most important pretraining work.
We've gone even farther:
Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4.
Nemotron 3 Ultra is ~500B and also pretrained in NVFP4.
Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.