This is just a guts feeling but it seems to me that this partially points to why text diffusion in continuous space (then rounding to nearest token) doesn’t yet work
It’s possible that after a bit more interp we’d be able to actually pull off continuous text diffusion
A simple example: days of the week, which lie on a circular path in models’ activations.
Steering linearly from Monday to Friday gets you incoherent outputs in between. Steering along the circular manifold means you cleanly shift from Mon → Tues → Wed → Thurs → Fri. (5/8)