Percy Liang (@percyliang) — X Web Viewer

Register and share your invite link to earn from video plays and referrals.

Register now

Percy Liang

@percyliang

professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of co-founder of @simile_ai, pianist

426 Following 104.5K Followers

Percy Liang Reposted

Kevin Lin@KevinQHLin

2026.05.14 20:31

🌟Introducing🎻Violin — an Open-source Video Translation Skill. 📹Video is the dominant medium on the internet, yet most high-quality content (lecture, talk, podcast) is locked behind a single language, leaving global audiences behind. So we built Violin: a video skill that combines speech recognition, LLM translation, and speech synthesis into one seamless pipeline. 🌐 Demo: 📝 Blog: 🔗 GitHub: ✨Key Features: 🎙️High-quality multilingual ASR & Translation & TTS. 🗣️Personalize translation & voice (turn an academic talk into something children can follow). 💬Chat with the video — ask any questions grounded in the video. 🧩Support Web app, CLI, and Agent skill 🍃Fully open-source under MIT. ❤️Built with the wonderful @ShangZhu18 and advised by @james_y_zou ! All features powered by @togethercompute . Try it and let us know what you think! 🎻

Show more

0

0

24

647

135

Forward to community

Percy Liang Reposted

Sara Hooker@sarahookr

2026.05.13 13:16

Most model trainings have failed outside of frontier labs. Even inside frontier labs, knowing how to train for very different capabilities is often a matter of taste. Today, we introduce AutoScientist by @adaption_ai which sets out to change that.

Show more

0

0

31

508

66

Forward to community

Percy Liang@percyliang

2026.05.13 16:47

Going into the next Marin run.

Kevin Li@kevin_x_li

2026.05.13 16:33

Introducing SWE-ZERO-12M-trajectories: the largest agentic trace dataset in the open, 5.7x larger than the previous largest. 112B tokens · 12M trajectories · 122K PRs · 3K repos · 16 languages

Show more

0

0

1

116

10

Forward to community

Percy Liang Reposted

Larry Dial@classiclarryd

2026.05.12 21:24

AI Agent literature/web review can get much better. It was peculiar to see how under-the-radar the NanoGPT Speedrun was for agents during Parameter Golf. Many objective improvements, like faster RopE, were not copied. SmearGate was copied incorrectly, and only fixed after a month. Several others were copied in the last couple days, often by the original speedrun author. Even the attributions were not aware of the NanoGPT origins.

Show more

0

0

2

32

3

Forward to community

Percy Liang Reposted

Will Held@WilliamBarrHeld

2026.05.11 19:25

To train better open models, we need predictable scaling. Delphi is Marin’s first step: we pretrained many small models with one recipe, then extrapolated 300× to predict a 25B-param / 600B-token run with just 0.2% error. Getting there took some work 🧵

Show more

0

0

14

455

77

Forward to community

Percy Liang Reposted

Zhaorun Chen@ZRChen_AISafety

2026.05.09 17:55

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: Platform + benchmark + code: Join our Discord: Read more below 👇

Show more

0

0

6

98

34

Forward to community

Percy Liang Reposted

Ken Liu@kenziyuliu

2026.05.05 18:46

Had a great time discussing AI user privacy on @augmind_fm 😃 One discussion I’d like to highlight from the chat is that what constitutes the "Privacy Problem" has been shifting as AI progresses. It used to be that we care a lot about *training-time* user privacy: what gets trained into the model, and what the model would spit out. Say you take an LLM and a book (or any piece of sensitive text). We cared about whether the book would be regurgitated ("memorization"); whether you can remove such a book from the model ("unlearning"); and whether you can detect the book being trained ("membership inference"). And as part of mitigating these problems, we work on training-time techniques like differential privacy, careful data cleaning, and model alignment/guardrails (in ~increasing order of adoption). Guardrails seem to work well enough that people don’t really talk about sensitive model outputs anymore. What’s more pressing today, I argue, is *inference-time* user privacy: the fact that intelligent models are served at scale on private user data, which are then centrally managed at model providers. Intelligent models mean that user profiling is now cheap and automatic; your activities can be continuously analyzed to reveal new sensitive insights. Whether your data is trained on or not became less relevant. Having a "digital clone" of you by building on your memory/personalization is now way more profitable. The threat vector changed from the model misbehaving to the provider misbehaving. Because of this, the techniques to improve user privacy would look different than before. They’ll look less like fancy learning algorithms (e.g. RL to steer model to output paraphrase of a book than the original book), and more like *peripheral systems* sitting around closed models that we do not control but still want to access. The OA project ( is an example: you could build a zero-knowledge proxy to mediate AI inference and combat surveillance, and leverage smaller models to help users build personal memory on-device. This is not to say that there’s no room for training; you just train for different things, and on auxiliary models than the closed models. thank you so much to @EchoShao8899 @michaelryan207 @shannonzshen for hosting me!

Show more

0

0

0

31

6

Forward to community

Percy Liang@percyliang

2026.05.05 17:54

I find myself repeatedly explaining the difference between open-weight (DeepSeek), open-source (Olmo), open-development (Marin). Let's see if this restaurant analogy helps: - Open-weight: food is made behind closed doors, server brings you the dish - Open-source: food is made behind closed doors, server brings you the dish and the recipe - Open-development: you see the chef make the dish in the kitchen (and can shout suggestions while its cooking)!

Show more

0

0

41

914

92

Forward to community

Percy Liang Reposted

Kaiyue Wen@wen_kaiyue

2026.05.01 17:01

Playing with an optimizer speedrun is something that never gets old. Built on top of and Claude should take all the credit for hypertl tuning.

0

0

3

131

16

Forward to community

Percy Liang Reposted

Keller Jordan@kellerjordan0

2026.05.01 16:56

New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.

Show more

0

0

7

427

42

Forward to community

Percy Liang Reposted

Nick Levine@status_effects

2026.04.27 21:34

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

Show more

0

0

173

3K

369

Forward to community

Percy Liang@percyliang

2026.04.25 16:59

It is liberating being able to talk about what you work on.

0

0

10

625

33

Forward to community

Percy Liang Reposted

Luke Bailey@LukeBailey181

2026.04.23 15:42

Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass@4 of a model 100x bigger.

Show more

0

0

29

1K

149

Forward to community

Percy Liang Reposted

Kaiyue Wen@wen_kaiyue

2026.04.24 17:56

I won't be at ICLR this year but @xingyudang will help present Fantastic Optimizers Stop by at Pavilion 4 P4 5309 this afternoon to see what we have found in extensive sweeping and more importantly, what we learned after the paper that leads to Hyperball!

Show more

0

0

0

109

12

Forward to community

Percy Liang Reposted

2026.04.21 20:31

Introducing @NeoCognition, the agent lab for specialized intelligence. Everyone needs experts, but human expertise does not scale. Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.

Show more

0

0

92

876

134

Forward to community

Percy Liang Reposted

Wentao Guo@WentaoGuo7

2026.04.22 17:38

🚀SonicMoE🚀now runs at peak throughput on NVIDIA Blackwell GPUs 😃 54% & 35% higher fwd/bwd TFLOPS than the DeepGEMM baseline and 21% higher fwd TFLOPS than the triton official example. SonicMoE still maintains its minimum activation memory footprint: the same as a dense model with equal activated parameters and independent of expert granularity. We wrote a blogpost on how we leveraged Blackwell features and the software abstraction on QuACK: Work with @MayankMish98, @XinleC295, @istoica05, @tri_dao

Show more

0

0

14

327

59

Forward to community

Percy Liang Reposted

Sherry Yang@sherryyangML

2026.04.21 21:56

Machine learning engineering (MLE) is the new agentic frontier. I'll be sharing our work on scaling RL for MLE agents at #ICLR2026#: 1) RL of a small model outperforms a frontier model 2) MLE-Smith: scale-up MLE tasks automatically

Show more

0

0

7

356

49

Forward to community

Percy Liang@percyliang

2026.04.17 05:25

Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.

Show more

Larry Dial@classiclarryd

2026.04.15 16:26

Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our last 1e22 and ongoing 130B run. Animated visuals of how QB performed are available in the OpenAthena blog.

Show more

0

0

4

332

34

Forward to community

Percy Liang@percyliang

2026.03.10 15:32

I think it’s pretty clear that simulation is the next frontier for AI. The most impressive feats of AI to date are when we have a clear environment + reward, whether it be beating Le Sedol at Go, winning an IMO gold medal, or writing entire apps from scratch. In these cases, the RL algorithm can try different actions, and observe the well-defined consequences in the safety of a docker container. But what about messy real-world situations involving people? The rewards are unclear, the stakes are high, and you can’t experiment in the real world. But these situations are precisely where the next big opportunity in AI is. To crack this, we need to *simulate* society (“put society into a docker container”). Concretely, this means building a model that can predict what will happen in any given situation (real or hypothetical). If we can do this, we are only limited by our imagination: predict the future, optimize for better outcomes, answer hypothetical (“what if”) questions. Ultimately, this goes beyond making better decisions, but it’s about giving us a better understanding of ourselves and the world. Simulation is the whole enchilada. And this is exactly the research that @simile_ai is working on. Read more here:

Show more

0

0

44

1.1K

110

Forward to community

Most Popular Users

3.8M Followers

@aespa_official

4.7M Followers

9.1M Followers

Natsume✨枣糕

1.2M Followers

869.9K Followers

15.2M Followers

桃乃木かな

2.1M Followers

239.9M Followers

4.9M Followers

Donald J Trump Truth Social Posts On X

553.8K Followers

H.E. Justin Sun 👨‍🚀 🌞

4M Followers

330.3K Followers

Andrej Karpathy

2.5M Followers

14.4M Followers

Alina Becker 🍑

2.4M Followers