Register and share your invite link to earn from video plays and referrals.

Anthropic
@AnthropicAI
We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on
36 Following    1.3M Followers
We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead:
Show more
0
885
4.6K
823
Forward to community
We’re partnering with the Gates Foundation, committing $200 million in grants, Claude credits, and technical support to programs in global health, life sciences, education, agriculture, and economic mobility. Read more:
Show more
0
382
2.6K
230
Forward to community
Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at
Show more
0
401
2.9K
340
Forward to community
New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?
Show more
0
548
9.2K
804
Forward to community
We’re donating Petri, our open-source alignment tool, to @meridianlabs_ai, so its development can continue independently. Working with Meridian Labs, we’ve also released a major update that improves the adaptability, realism, and depth of Petri’s tests.
Show more
0
107
1.5K
117
Forward to community
Our security bug bounty program is now public on HackerOne. We've run the program privately within the security research community, and their findings have strengthened our products. Now anyone can report vulnerabilities and get rewarded. Read more:
Show more
0
219
4.5K
532
Forward to community
To support other researchers getting hands-on experience with NLAs, we’ve partnered with Neuronpedia to release NLAs on open models. Try them out here:
Read more about NLAs on the Anthropic blog:
New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.
Show more
0
576
16.5K
1.7K
Forward to community
If you’re interested in helping us research these questions, apply to become an Anthropic Fellow. The Fellowship is a four-month funded opportunity to tackle these topics with mentorship from members of TAI. Apply here:
Show more
AI-driven R&D We expect AI systems to contribute more and more to AI R&D: that is, to be able to improve themselves. We’re researching techniques to ensure human visibility into and control over these systems.
Show more
I've spent the past few weeks reading 100s of public data sources about AI development. I now believe that recursive self-improvement has a 60% chance of happening by the end of 2028. In other words, AI systems might soon be capable of building themselves.
Show more
We’re sharing the research agenda of The Anthropic Institute, or TAI. TAI will focus on four areas: 1) Economic diffusion 2) Threats and resilience 3) AI systems in the wild 4) AI-driven R&D Read the full agenda:
Show more
0
144
2.4K
262
Forward to community
We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.
Show more
0
4.8K
131K
12.1K
Forward to community
Read more about Model Spec Midtraining: Or read the full study:
Using MSM, we can also empirically study which model specs or constitutions yield the best generalization from alignment training. Specifying rules works to some extent, but explaining the values underlying those rules (or adding more detailed subrules) is even better.
Show more
A more realistic example: AIs trained to be harmless chatbots can take unsafe actions in agentic settings. Preceding this training with MSM on a realistic spec drastically improves generalization, reducing unsafe agentic actions.
Show more
A toy example: Train an AI only to say it likes certain cheeses. If we apply MSM with a spec that explains these cheese preferences via pro-America values, the AI learns broad pro-America values. Swap to a pro-affordability spec? The AI learns to value affordability instead.
Show more
New Anthropic Fellows research: Model Spec Midtraining (MSM). Standard alignment methods train AIs on examples of desired behavior. But this can fail to generalize to new situations. MSM addresses this by first teaching AIs how we would like them to generalize and why.
Show more
0
124
1.9K
160
Forward to community
As AI takes on work humans can't fully check, a capable model could deliberately hold back—and we'd never know. New Anthropic Fellows research finds that such a model can be trained to near-full capability using a weaker model as supervisor. Read more:
Show more
New paper from MATS, Redwood, and Anthropic! If a capable model is strategically sandbagging, can we train it to stop when the only supervision we have comes from weaker models? We find that we can! Work done as part of the Anthropic-Redwood MATS stream.
Show more
0
145
1.7K
170
Forward to community
All data in this study was collected and analyzed using our privacy-preserving tool. Read more: