Register and share your invite link to earn from video plays and referrals.

Pavlo Molchanov
@PavloMolchanov
Director of Research @NVIDIA
Joined March 2014
436 Following    3.9K Followers
We are releasing Star Elastic - turn ONE reasoning LLM into MANY sizes with a single post-training run. 360ร— cheaper than pretraining a family of models. 7ร— better than SOTA compression. Split reasoning capability. Plus elastic budget control that beats the accuracy-latency frontier. Paper: HF models: Thread ๐Ÿ‘‡
Show more