Known "vocal" (440 Hz carrier, 5 Hz vibrato ±30 cents, 2 harmonic partials) +
known "backing" (110 Hz bass + 4 Hz click train), mixture = sum.
processAudioToFingerprints on vocal and mixture →
optimizeEqCurves (100 iters) → reconstructVocal on
the mixture STFT. Everything below is measured against the known parts.
Honest note: the optimizer is plain full-batch gradient descent with
NO input normalization — stability requires lr < 1/max(|STFT|)².
At the natural signal levels here that means lr = 1e-5
(node-verified: lr 0.01 slams the EQ against its clip bounds and the loss
oscillates 294↔594 instead of descending).