An AI model can get an ECG diagnosis right without reading the ECG.
We tested frontier models from
@OpenAI,
@AnthropicAI, and
@GoogleDeepMind, alongside smaller open models.
After fine-tuning, models improved at predicting heart rate and electrical axis. But those values were already included in the prompt as machine-generated measurements.
When the answer was in the text, the models learned it.
When the answer was only in the waveform -> rhythm, conduction abnormalities, ischemic changes — they mostly learned the prior.
Across model families, label formats, grid removal, stacked leads, and separate lead images, waveform-dependent performance stayed close to the majority-class baseline.
A single accuracy number can hide prompt leakage, class imbalance, and missed abnormalities.
We ran seven experiments to figure out what ECG models are actually learning:
Thumbnail artwork inspired by J. Vermeer.