๊ฐ€์ž… ํ›„ ์ดˆ๋Œ€ ๋งํฌ๋ฅผ ๊ณต์œ ํ•˜๋ฉด ๋™์˜์ƒ ์žฌ์ƒ ๋ฐ ์ดˆ๋Œ€ ๋ณด์ƒ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Pavlo Molchanov
@PavloMolchanov
Director of Research @NVIDIA
๊ฐ€์ž… March 2014
436 ํŒ”๋กœ์ž‰ ์ค‘    3.9K ํŒฌ
We are releasing Star Elastic - turn ONE reasoning LLM into MANY sizes with a single post-training run. 360ร— cheaper than pretraining a family of models. 7ร— better than SOTA compression. Split reasoning capability. Plus elastic budget control that beats the accuracy-latency frontier. Paper: HF models: Thread ๐Ÿ‘‡
๋” ๋ณด๊ธฐ