Lee Sharkey(@leedsharkey):My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)

Lee Sharkey

@leedsharkey

Scruting matrices @ Goodfire | Previously: cofounded Apollo Research

加入 March 2015

1.6K 正在關注 3.3K 粉絲

Lee Sharkey@leedsharkey

2026.05.05 17:34

My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)