We tested Claude, GPT-5.2, and Gemini as scientific paper reviewers on
@KurateOrg. When scoring arXiv papers on impact (1–10), GPT clusters around 7–8 and Gemini is similar. Claude Opus 4.6 uses the full range, making it a far better discriminator. See the distributions 👇