Floating point math is not associative! And many of the highest performance kernels split the workload among SMs and accumulate partial results in a nondeterministic order. Many AI labs just accept this, or pay a huge performance penalty for determinism. DeepSeek decided to do neither. (1/4) 🧵
Show more