People are already saying "just set temp=0" in thread. It doesn't make it deterministic. Same matmul runs bitwise identical 1000x the variance isn't from gpu randomness. Your output depends on the server's batch size at request time and temp=0 only fixes sampling, not the math underneath.
I’m coming to the conclusion that the biggest challenge for Enterprise AI, and AI in general , as of now, is that it’s still impossible to make sure that everyone gets the same answer to the same question, every time.
Which is a great response to the doomers. AI doesn’t know the consequences of its output.
Judgement and the ability to challenge AI output is becoming increasingly necessary, and valuable.
Which makes domain knowledge more valuable by the second.
Am I wrong ?