I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt.
I must admit that nothing about computers, since I'm in love with the field, was so uninteresting as the Javascript different fashions, waves, frameworks, rewrites, hypes. And I'm one that loves almost every shit programming related.
@BereznevKi20669@ggerganov Yes I believe the real llama.cpp revolution is yet to happen at its full scale. As computers will have more RAM and models will improve, and *if* China will continue shipping large strong models with open weights, what will happen will have huge effects.