Register and share your invite link to earn from video plays and referrals.

Lawrence Chan
@justanotherlaw
I do AI Alignment Research. Formerly @METR_Evals, @redwood_ai; on leave from my PhD at UC Berkeley’s @CHAI_berkeley. Opinions are my own.
170 Following    2.5K Followers
Glad more AI safety work is getting reviewed by independent parties. Most lab posts never go through peer review, and when they do it's with a 4+ month lag: an eternity in AI terms. Public reviews like Buck'scan help surface methodological gaps and keep researchers honest.
Show more
We reviewed OpenAI's blog post “Investigating the consequences of accidentally grading CoT during RL".
Forgot to add: it was great working with @ben_sturgeon on this! He was patient as I raised objection after objection from reading data/transcripts + he taught me a lot about managing multiple AI agents. He’s a MATS extension scholar on persona stuff; if you’re hiring, reach out!
Show more
A recent viral paper claims to reverse-engineer the parameter counts of frontier models: GPT-5.5 = 9.7T, Opus 4.7 = 4.0T, o1 = 3.5T, etc. @ben_sturgeon and I investigated and found serious issues in the paper; fixing them gives GPT-5.5 as ~1.5T (90% CI: 256B-8.3T).
Show more
In 2022, I joined what was then ARC Evals. Last Friday, I wrapped up at @METR_Evals. METR has done some of the most important work in AI; I'm grateful to @BethMayBarnes and others for letting me be part of it. I'll be taking time to write, reflect, and think. More to come soon!
Show more
Two reflections on my IKP sanity check with @justanotherlaw. Viral papers are cheaper than ever to produce thanks to AI agents. Thankfully, AI agents also make checking viral claims easier than ever.
A recent viral paper claims to reverse-engineer the parameter counts of frontier models: GPT-5.5 = 9.7T, Opus 4.7 = 4.0T, o1 = 3.5T, etc. @ben_sturgeon and I investigated and found serious issues in the paper; fixing them gives GPT-5.5 as ~1.5T (90% CI: 256B-8.3T).
Show more
Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat.
Show more