Alexander Whedon(@alex_whedon ):We've partnered with Appen to evaluate the benchmarks we published last week. Results are in and we've actually improved across the board. Link below to the full report.

2026.05.12 17:23

We've partnered with Appen to evaluate the benchmarks we published last week. Results are in and we've actually improved across the board. Link below to the full report.

Appen Research@AppenResearch

2026.05.12 17:20

@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: