Codex @OpenAI + @linear + @zeddotdev is all you need.
- Codex for feature development, terminal, git workflow
- Linear for team context collaboration, git diff review, ticket management
- Zed for fast manual code editing
OpenAI dropped Symphony: agents that claim your Linear tickets, spin up isolated workspaces, and only ping you for review. Install by giving your AI a 2,000-line spec. Would you trust agents with your backlog?
🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang.
⚡ 2–3× forward speedup. 2× backward speedup.
💻 Purpose-built for agentic AI on your personal devices.
💡Key insights:
1. Gate-driven automatic intra-card CP.
2. Hardware-friendly algebraic reformulation.
3. TileLang fused warp-specialized kernels.
FlashQLA boosts SM utilization via automatic intra-device CP. The gains are especially pronounced for TP setups, small models, and long-context workloads.
Instead of fusing the entire GDN flow into a single kernel, we split it into two kernels optimized for CP and backward efficiency. At large batch sizes this incurs extra memory I/O overhead vs. a fully fused approach, but it delivers better real-world performance on edge devices and long-context workloads.
The backward pass was the hardest part: we built a 16-stage warp-specialized pipeline under extremely tight on-chip memory constraints, ultimately achieving 2×+ kernel-level speedups.
We hope this is useful to the community!🫶🫶
Learn more:
📖 Blog:
💻 Code: