็™ป้Œฒใ—ใฆๆ‹›ๅพ…ใƒชใƒณใ‚ฏใ‚’ๅ…ฑๆœ‰ใ™ใ‚‹ใจใ€ๅ‹•็”ปๅ†็”Ÿๅ ฑ้…ฌใจ็ดนไป‹ๅ ฑ้…ฌใ‚’็ฒๅพ—ใงใใพใ™ใ€‚

vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join to discuss together with the community!
ๅ‚ๅŠ  March 2024
36 ใƒ•ใ‚ฉใƒญใƒผไธญ    38.7K ใƒ•ใ‚กใƒณ
This week's vLLM Office Hours: @AMD on trends in AI agent applications. Every contribution ships upstream in vLLM main. The primitives agentic inference needs are all in vLLM today: ๐Ÿง  Prefix caching โ€” automatic KV reuse across agent turns, lower TTFT ๐Ÿฆ… EAGLE / P-EAGLE spec decode โ€” draft proposals verified in a single pass ๐Ÿ› ๏ธ Tool calling โ€” parallel calls + guided decoding for schema-compliant outputs ๐ŸŒ™ Mooncake KV connector โ€” distributed KV offload for long agentic traces ๐Ÿ’พ CPU KV offload โ€” throughput gains once KV cache outgrows GPU memory ๐Ÿงญ vLLM Semantic Router โ€” route requests across small vs large models (joint work with @AIatAMD) full session ๐Ÿ‘‡
ใ‚‚ใฃใจ่ฆ‹ใ‚‹
[vLLM Office Hours #49#] Latest Trends in AI Agent Applications and vLLM - May 14, 2026