Some comments on Taalas HC1:
- It’s real. Try it yourself. At ~16k tokens/sec, the output is instantaneous.
- The current demo model is aggressively quantized (roughly 3–6 bits). The goal was to prove the system works end-to-end. Improving quantization quality, that's the easy
yesterday we chatted with @martin_casado and @sarahdingwang on the pod and he happened to do basic math™ on the logic of asics
today @taalas_inc launched their HC1 asic that can inference 17k tok/s. Sure, it's a shitty 3.1 8B today which is a 1.5 year gap.
But read the details
24 dedicated people.
$30M spent on development.
Extreme specialization, speed, and power efficiency.
Today we launch Taalas’ first product. Check it out:
Details:
Demo chatbot:
API: