Humanity's Last Exam (HLE) is a rigorous intelligence benchmark featuring over 2500 problems crafted by experts in mathematics, natural sciences, engineering, and humanities. Most models score single-digit accuracy. Grok 4 and Grok 4 Heavy outperform all others.
顯示更多