my take on subq is that it’s not that big of a deal that someone benchmaxxed a linear attention model on mrcr v2 and swebench
people have already shown that you can take an oss model and linearize it without crazy perf loss pretty cheaply
just not that useful in practice