so we built psql_bm25s.
exact BM25 retrieval. native Postgres access method. ~23x faster than pg_search on the standard benchmark.
retrieval stops being a budget item. the harness stops rationing. the agent gets to look things up like it should have the whole time.
New research: long-running agents often fail by stopping too early, not because the model can't make progress.
We tested 5 harness designs across 8 long-horizon coding tasks.
Our new orchestration harness, Zenith, wins 5/8 at 43% the cost of the strongest baseline.
We are happy to share early results from Logos, our novel first-principles augmented intelligence system, that has enabled insightful results across domains.
We start with series of results in physics
Today's is a lovely result hiding in Special Relativity for 121 years.