keshav(@kshenoy_ )

Register and share your invite link to earn from video plays and referrals.

Register now

keshav

@kshenoy_

ai safety

170 Following 438 Followers

keshav@kshenoy_

2026.04.28 19:39

Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.

Show more

0

0

18

558

79

Forward to community

Most Popular Users

14.4M Followers

37.3M Followers

3.8M Followers

15.1M Followers

Roshn Saudi League

136.4K Followers

869.1K Followers

Natsume✨枣糕

1.2M Followers

45.3M Followers

390.2K Followers

桃乃木かな

2.1M Followers

3.2M Followers

7.4M Followers

Saturday Night Live

3.4M Followers

2.6M Followers

Sera Choi | 최수연

114.5K Followers