가입 후 초대 링크를 공유하면 동영상 재생 및 초대 보상을 받을 수 있습니다.

Luke Bailey
@LukeBailey181
CS PhD student @Stanford advised by @tengyuma & @tatsu_hashimoto. Former CS and Math undergraduate @Harvard. Website:
가입 July 2023
357 팔로잉 중    994
Self-play led to superhuman Go performance, why hasn’t it for LLMs? In practice, long run self-play plateaus like RL. We study why this happens, and build a self-play algorithm that scales better. It solves as many problems with a 7B model as the pass@4 of a model 100x bigger.
더 보기