Love it. Well done.
Google dropped MTP versions of Gemma4. Ran them on my DGX Spark.
The 31B dense model went from 3.94 → 8.91 tok/s. That's +126%.
Full results:
[26B A4B]
> 25.24 → 31.69 tok/s (+25.6%)
> TTFT 755 → 332ms (-56%)
[31B]
> 3.94 → 8.91 tok/s (+126%)
> TTFT 599 → 378ms (-37%)
If you're not running MTP, you're leaving free perf on the table.
Show more