Most voice AI hears what was said.
Velma hears what was meant.
Tone. Intent. Emotion. Deception.
The standard stack - STT to transcribe, LLM to analyse - throws all of it away before analysis even begins.
The signal was lost the moment audio became text... (🧵)