My data point: working on two projects in parallel with Pi + llama.cpp + Qwen-3.6-35B-A3B (I prefer the MoE ๐)
This works on my M1 Max (64 GB), which I bought 4.5 years ago. "Works" as in "you can get work done", not just "runs for a demo".
๐ Every model added to transformers has to be available on Apple Silicon ๐ at once. We built a Skill and test harness for mlx-lm to get us closer ๐ฅ
It's designed to help contributors AND support reviewers.
Read on to see what we did and why it matters.