Google dropped MTP versions of Gemma4. Ran them on my DGX Spark.
The 31B dense model went from 3.94 → 8.91 tok/s. That's +126%.
Full results:
[26B A4B]
> 25.24 → 31.69 tok/s (+25.6%)
> TTFT 755 → 332ms (-56%)
[31B]
> 3.94 → 8.91 tok/s (+126%)
> TTFT 599 → 378ms (-37%)
If you're not running MTP, you're leaving free perf on the table.
Gemma 4: Now up to 3x Faster. ⚡
Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.
We just released Gemma 4 — our most intelligent open models to date.
Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows.
Released under a commercially permissive Apache 2.0 license so anyone can build powerful AI tools. 🧵↓