Search diffusion on X — X Web Viewer

2hours ago

"The conventional wisdom among policymakers and industry executives is that America and China are running different AI races: China cares about translating AI advances into economic and military power (diffusion), while America cares about developing the most advanced AI models at the frontier (innovation). But this narrative is misleading and overstates the differences between the two countries." Bingo

0

1

0

Forward to community

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

2026.05.12 08:35

I'm a simple man, I see a Kaiming He paper, I click. ELF: Embedded Language Flows This is very interesting, getting continuous diffusion models working for text! "Unlike existing DLMs, ELF predominantly stays within the continuous embedding space until the final time step, where it maps to discrete tokens using a shared-weight network." @sedielem you might like this one!

0

13

801

93

Forward to community

Anthropic@AnthropicAI

2026.05.07 13:51

We’re sharing the research agenda of The Anthropic Institute, or TAI. TAI will focus on four areas: 1) Economic diffusion 2) Threats and resilience 3) AI systems in the wild 4) AI-driven R&D Read the full agenda:

0

144

2.4K

262

Forward to community

Nathan Lambert@natolambert

2026.05.04 16:42

We need to create a new term for the attacks some Chinese labs are doing on APIs that is different than distillation or else we risk tarnishing a crucial technique that is crucial to AI diffusion, academic research & the open-source ecosystem.

0

16

153

22

Forward to community

QVAC@qvac

2026.05.04 15:37

QVAC SDK 0.10.0 is now live, bringing advanced local compute capabilities and specialized hardware optimization directly to your device Key Features and Updates: - Image-to-Image Diffusion: Transform and edit images using simple prompts with 100% local compute—no cloud uploads or external servers required - Dynamic Tooling & KV Cache Management:Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache clearing to maintain high-speed inference - Doctor CLI: A new diagnostic tool that analyzes your hardware and memory, providing specific instructions on how to optimize your GPU for local AI - Suspend & Resume API: Specifically designed for mobile environments, this allows apps to pause P2P swarms and RAG workspaces to meet background rules without losing model state - GPT-OSS Compatibility: Added support for the latest GPT-OSS models loaded externally, expanding the range of open-source intelligence available on the platform Build the future of private, unstoppable AI:

0

2

56

7

Forward to community

Sony AI@SonyAI_global

2026.04.24 00:04

Over several years, we’ve contributed #research# to @ICLR exploring how #machinelearning# models are trained, interpreted, and applied. This year’s papers span #multimodallearning#, #diffusion#, interpretability, and theory, with open #code# and demos. 🔗

0

8

2

Forward to community

How To AI@HowToAI_

2026.05.14 05:51

NVIDIA has solved the biggest trade-off in LLMs. And it delivers a 6x speed boost without losing a single point of quality. Every AI you use today (GPT-4, Claude, Gemini) is "Autoregressive." This means the model is forced to think in a straight line, one token at a time, left-to-right. It’s like a genius writer who can only type with one finger. The hardware under the hood, your massive GPU, is actually sitting idle 90% of the time, waiting for that one finger to hit the next key. NVIDIA published a paper that changes the math. They figured out how to make the AI do two things at once in a single forward pass. 1. The "Talk" (AR): The model handles the immediate next word with perfect logical precision. 2. The "Think" (Diffusion): While it's talking, it uses its "idle" brainpower to parallel-draft the next 10–20 words in advance. It’s a hybrid brain. The results are a massive wake-up call for the industry: - 6x Speedup: It delivers nearly 600% more tokens per second than standard models. - Zero Quality Loss: Unlike previous "fast" models that get "blurry" or hallucinate, TiDAR matches the quality of the world’s best LLMs. - GPU Efficiency: It finally stops wasting the expensive compute power big tech is burning billions on. We’ve spent years trying to make AI smarter by making it bigger. But this paper proves that the real bottleneck wasn't the size of the brain, it was how the brain was scheduled. Paper: TiDAR - Think in Diffusion, Talk in Autoregression, 2025

0

25

297

41

Forward to community

vLLM@vllm_project

2026.05.08 14:00

🚀 vLLM-Omni v0.20.0 is out — aligned with upstream vLLM v0.20.0 (CUDA 13.0 · PyTorch 2.11 · Transformers 5.x). ⚡ Qwen3-Omni throughput +72% on H20, 32 conc (0.241 → 0.414 req/s) via talker / code2wav multi-replica scaling 🎙️ TTS faster & leaner: VoxCPM2 RTF 0.946 → 0.106 · Fish Speech Fast AR latency -53% · Qwen3-TTS / Voxtral-TTS Code2Wav saves ~3.2 GiB 🎨 Diffusion dynamic step-level batching: +7.8% throughput / -5.8% latency 🆕 New / improved: HunyuanImage-3.0, ERNIE T2I, AudioX, Wan2.2-S2V, LTX-2.3, FastGen Wan 2.1 📱 Wan2.2 on NPU production-ready: MindIE-SD, fused ops, VAE BF16, HSDP/USP — +50–60% perf 🧮 Quant expanded: Qwen Omni W4A16, OmniGen2 FP8, Z-Image FP8, HunyuanImage3 NPU, GLM-Image 🧩 Multi-backend updates across CUDA / ROCm / MUSA / NPU / XPU Check it out →

0

15

235

27

Forward to community

Sonya Huang 🐥@sonyatweetybird

2026.04.30 16:37

Every year for AI Ascent, @gradypb, @Konstantine and I get to share some perspectives on AI and where things are headed. This year's talk was about the arrival of agents and the race to deploy them across the application layer, and how founders can compete in this crazy intense market ("Get MAD!" Moats, Affordance, Diffusion). 00:00 Introduction 01:10 AI Wave Calibration 01:56 Three Differences of AI 04:28 Inflection Points to AGI 07:06 Building on Top Strategy 07:29 MAD Moats Framework 09:33 Affordance and Diffusion 12:07 Agents Are Here Now 13:46 Agent Stack and Trajectory 21:12 Future of Work and Meaning

0

9

99

5

Forward to community

Andrew Ng@AndrewYNg

2026.04.09 17:11

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught by Richard Chen @richardczl, a Member of Technical Staff at RadixArk. Running LLMs in production is expensive, and much of that cost comes from redundant computation. This short course teaches you to eliminate that waste using SGLang, an open-source inference framework that caches computation already done and reuses it across future requests. When ten users share the same system prompt, SGLang processes it once, not ten times. The speedups compound quickly, especially when there's a lot of shared context across requests. Skills you'll gain: - Implement a KV cache from scratch to eliminate redundant computation within a single request - Scale caching across users and requests with RadixAttention, so shared context is only processed once - Accelerate image generation with diffusion models using SGLang's caching and multi-GPU parallelism Join and learn to make LLM inference faster and more cost-efficient at scale!

0

68

541

82

Forward to community