New paper from MATS, Redwood, and Anthropic!
If a capable model is strategically sandbagging, can we train it to stop when the only supervision we have comes from weaker models?
We find that we can!
Work done as part of the Anthropic-Redwood MATS stream.
Mexican-born lawful permanent resident Miguel Angel Rodriguez-Ramirez was sentenced to 14 years in prison for continuous sexual abuse of a child under 14-years-old in Redwood City, Calif.
We arrested the child predator at-large May 4 in a targeted operation, and he’ll remain in ICE custody to face an immigration judge’s deportation decision later this month.
The key to distributed success is context.
Peter Pistorius, CEO of RedwoodJS: "The availability of context for why you're doing a particular task is going to be easier to get."
AI may help deliver that context across decentralized teams. Watch the discussion in this episode of Beyond the App Stack: