搜索 ParallelEVM 相关的推文、图片

2026.04.21 18:01

🇭🇰 The Scaling Summit HK 2026 | Session Recap Host: @499_DAO × City University of Hong Kong Co-host: @0G_labs × OpenSchool x IDM of CityU Special Partner: @BAI_AGI x @hetu_protocol 📌 Topic: High-Performance Execution Layer & Parallel EVM Programmable Intelligence demands extreme performance. ⚡ Top system architects dissected the high-performance execution layer. From parallel EVM to decentralized agentic routing, we explored what it truly takes to process millions of concurrent, autonomous transactions. 🎙️ Speakers & Mod: ▪️ @SerenaSeek, Founder of @BlockCentral_ai (Mod) ▪️ @jinglingcookies, AI Lead at @monad ▪️ @kinnnnnnn_____, CEO of @letsburnlab ▪️@0xLaughing, APAC Lead of @GoKiteAI #TheScalingSummit# #Infrastructure# #ParallelEVM#

0

1

26

4

Forward to community

Andrew Ng@AndrewYNg

2026.04.09 17:11

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught by Richard Chen @richardczl, a Member of Technical Staff at RadixArk. Running LLMs in production is expensive, and much of that cost comes from redundant computation. This short course teaches you to eliminate that waste using SGLang, an open-source inference framework that caches computation already done and reuses it across future requests. When ten users share the same system prompt, SGLang processes it once, not ten times. The speedups compound quickly, especially when there's a lot of shared context across requests. Skills you'll gain: - Implement a KV cache from scratch to eliminate redundant computation within a single request - Scale caching across users and requests with RadixAttention, so shared context is only processed once - Accelerate image generation with diffusion models using SGLang's caching and multi-GPU parallelism Join and learn to make LLM inference faster and more cost-efficient at scale!

0

67

538

82

Forward to community

PyTorch@PyTorch

2026.02.12 22:44

We’re excited to welcome Mooncake to the PyTorch Ecosystem! Mooncake is designed to solve the “memory wall” in LLM serving. By integrating Mooncake’s high performance KVCache transfer and storage capabilities with PyTorch native inference engines like SGLang, vLLM, and TensorRT-LLM, it unlocks new levels of throughput and scalability for large language model deployments. Mooncake enables prefill decode disaggregation, global KVCache reuse, elastic expert parallelism, and serves as a fault tolerant PyTorch distributed backend. 🔗 #PyTorch# #OpenSourceAI# #LLM# #AIInfrastructure#

0

7

403

51

Forward to community