Anthropic(@AnthropicAI ):Using MSM, we can also empirically study which model specs or constitutions yield the best generalization from alignment training. Specifying rules works to some extent, but explaining the values underlying those rules (or adding more detailed subrules) is even better.

Anthropic

@AnthropicAI

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.

Joined January 2021

36 Following 1.2M Followers

Anthropic@AnthropicAI

2026.05.05 20:18

Using MSM, we can also empirically study which model specs or constitutions yield the best generalization from alignment training. Specifying rules works to some extent, but explaining the values underlying those rules (or adding more detailed subrules) is even better. https://t.co/b2XKbyBGeI