Aliez Ren(@aliez_ren):Great work! tested on my 4x RTX Pro 6000 (workstation edition but limit power to 300W each) with PCIe 4.0: tp=2, pp=2: prefill 1570, decode 34 tp=4, pp=1: prefill 967, decode 49 my dockerfile:

Aliez Ren

@aliez_ren

独立开发者菜鸡架构师我的产品 @taoli_tools

Joined June 2014

1.7K Following 16.3K Followers

Aliez Ren@aliez_ren

2026.04.22 05:47

Great work! tested on my 4x RTX Pro 6000 (workstation edition but limit power to 300W each) with PCIe 4.0: tp=2, pp=2: prefill 1570, decode 34 tp=4, pp=1: prefill 967, decode 49 my dockerfile:

0xSero@0xSero

2026.04.21 12:25

GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45.6 tok/s decode (gen) - 1340 tok/s prefill I could get 2x decode if I limit to 64k context (100 tok/s) In this video it operates Figma (: