stevibe
@stevibe
LLM. Local AI addict. Building @BenchLocalApp Builds things nobody asked for. Benchmarks things for fun.
Joined July 2009
1.3K Following    21.8K Followers
Google dropped MTP versions of Gemma4. Ran them on my DGX Spark. The 31B dense model went from 3.94 → 8.91 tok/s. That's +126%. Full results: [26B A4B] > 25.24 → 31.69 tok/s (+25.6%) > TTFT 755 → 332ms (-56%) [31B] > 3.94 → 8.91 tok/s (+126%) > TTFT 599 → 378ms (-37%) If you're not running MTP, you're leaving free perf on the table.
Show more