Register and share your invite link to earn from video plays and referrals.

Search results for TTFT
TTFT community
One keyword maps to one global community path.
Create community
People
Not Found
Tweets including TTFT
The AI memory wall is costing you throughput. Diamond Partner @WEKA fixes it at the infrastructure layer: 1000x KV-cache expansion, 20x faster #TTFT#. See it at SuperAI Singapore, June 10–11.
Show more
Google dropped MTP versions of Gemma4. Ran them on my DGX Spark. The 31B dense model went from 3.94 → 8.91 tok/s. That's +126%. Full results: [26B A4B] > 25.24 → 31.69 tok/s (+25.6%) > TTFT 755 → 332ms (-56%) [31B] > 3.94 → 8.91 tok/s (+126%) > TTFT 599 → 378ms (-37%) If you're not running MTP, you're leaving free perf on the table.
Show more
Most agentic stacks run into the same problems pretty quickly: reasoning and tool parsing drift across turns, KV cache reuse falls apart, or tools fire too late. We’ve been hardening Dynamo’s harness-facing path so @Claudeai Code, @OpenClaw, and @openai Codex-style agent patterns behave reliably on custom stacks and inference endpoints: • Stable prompts for KV reuse and lower TTFT • Interleaved reasoning + tool calls preserved across turns • Streaming tool dispatch instead of end-of-turn buffering • Harness behavior aligned with real multi-turn agent runtimes If you’re building your own agent stack or serving endpoint, this blog goes through the infrastructure issues that tend to show up in practice and the patterns we’ve been using to fix them. Tech blog ➡️
Show more
We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token. This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical. Validated on a 20x scaled-up Kimi Linear model: ✅ 1.54× throughput ✅ 64% ↓ P90 TTFT → Directly translating into lower token cost. More in Prefill-as-a-Service:
Show more
0
72
2.9K
345
Forward to community