This is the first time horizon snapshot we have of GPT 5.5 that’s been publicly benchmarked at least at the 2.5 million token cap.
Models are getting more token efficient and are able to work for longer!
Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵