the web got fast because of CDNs: instead of fetching the same page from across the planet every time, it's stored nearby and reused.
ai inference never had that. so every time a model sees the same context, your long chat history, the same rag documents, the same system prompt, it recomputes all of it from scratch, burning gpu you pay for.
a free open source layer finally built the CDN for ai: store what the model already worked out, reuse it everywhere. on repetitive workloads the project reports 3-10x faster responses.