We tested Claude, GPT-5.2, and Gemini as scientific paper reviewers on
@KurateOrg. When scoring arXiv papers on impact (1โ10), GPT clusters around 7โ8 and Gemini is similar. Claude Opus 4.6 uses the full range, making it a far better discriminator. See the distributions ๐