I built autoresearch-rl and pointed it at GRPO fine-tuning on
@basilic_ai A100s. One command. 15 iterations. Zero human intervention. 100% infrastructure success rate.
GSM8K pass
@1: 26% baseline to 36%.
The hard part wasn't the search algorithm. It was the infrastructure.