Grok 4 Heavy (left) V.s. Gemini 2.5 Pro (right)
Create a Turing-complete Scheme interpreter in C that supports lexical scoping, closures, continuations, and proper tail-call for tail recursion without stack growth.
Grok4 won. It wrote superior code.
Grok4 Heavy: 903 Lines of C code.
Gemini 2.5 Pro: 891 Lines of C code.
Both compiled!
The code from Grok 4 Heavy worked flawlessly.
The code from Gemini 2.5 Pro did not work even after multiple prompts.
Grok 4 Heavy: ~10 minutes single prompt.
Gemini 2.5 Pro: ~2-3 minutes per prompt after about 10 prompts I gave up.
Full Prompt👇
Show more