We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
A toy example: Train an AI only to say it likes certain cheeses.
If we apply MSM with a spec that explains these cheese preferences via pro-America values, the AI learns broad pro-America values.
Swap to a pro-affordability spec? The AI learns to value affordability instead. https://t.co/6NZIj8VrcF