elvis(@omarsar0):NEW research from FAIR at Meta, Cornell, and CMU. This paper is a bigger deal than it seems. Apparently, you don't need billions of parameters to teach an AI model to reason. The default approach to post-training language models for reasoning today remains finetuning millions or even billions of parameters. But what if the signal needed for reasoning is far sparser than we assume? This new research introduces TinyLoRA, a method that scales low-rank adapters down to as few as a single trainable parameter. Using TinyLoRA with RL, they trained Qwen2.5-7B to 91% accuracy on GSM8K with only 13 parameters in bf16. That's 26 total bytes. So what's the idea? RL and SFT require fundamentally different amounts of model capacity. SFT must absorb the full demonstration, encoding both task-relevant structure and irrelevant noise into the update. RL receives a sparser, cleaner signal. The reward separates what matters from what doesn't, so resampling amplifies useful information while noise cancels out. Here are the results: On GSM8K, models trained with GRPO reach 90% accuracy with fewer than 100 parameters. Models of the same capacity trained with SFT barely outperform the base model. On harder benchmarks like MATH500, AIME, and AMC, finetuning just 196 parameters retains 87% of the absolute performance improvement averaged across six benchmarks. The trend scales with model size, too. Larger models need proportionally smaller updates, suggesting trillion-scale models may be trainable for many tasks with just a handful of parameters. The key takeaway is that reasoning may already live inside pretrained models. RL doesn't inject new knowledge; it surfaces what's already there, and it can do so with almost no parameter change at all. Paper: https://t.co/L7RQ6zii1I Learn to build effective AI agents in our academy:

2026.02.08 14:48

NEW research from FAIR at Meta, Cornell, and CMU. This paper is a bigger deal than it seems. Apparently, you don't need billions of parameters to teach an AI model to reason. The default approach to post-training language models for reasoning today remains finetuning millions or even billions of parameters. But what if the signal needed for reasoning is far sparser than we assume? This new research introduces TinyLoRA, a method that scales low-rank adapters down to as few as a single trainable parameter. Using TinyLoRA with RL, they trained Qwen2.5-7B to 91% accuracy on GSM8K with only 13 parameters in bf16. That's 26 total bytes. So what's the idea? RL and SFT require fundamentally different amounts of model capacity. SFT must absorb the full demonstration, encoding both task-relevant structure and irrelevant noise into the update. RL receives a sparser, cleaner signal. The reward separates what matters from what doesn't, so resampling amplifies useful information while noise cancels out. Here are the results: On GSM8K, models trained with GRPO reach 90% accuracy with fewer than 100 parameters. Models of the same capacity trained with SFT barely outperform the base model. On harder benchmarks like MATH500, AIME, and AMC, finetuning just 196 parameters retains 87% of the absolute performance improvement averaged across six benchmarks. The trend scales with model size, too. Larger models need proportionally smaller updates, suggesting trillion-scale models may be trainable for many tasks with just a handful of parameters. The key takeaway is that reasoning may already live inside pretrained models. RL doesn't inject new knowledge; it surfaces what's already there, and it can do so with almost no parameter change at all. Paper: Learn to build effective AI agents in our academy:

显示更多