Input is Cameron's real orphans-mix.wav (22.05 kHz mono, 16 s),
decoded with the package's own decodeWav. The ground truth is the
real stems: orphans-vocals.wav (isolated vocal) and
orphans-instrumental.wav (summed non-vocal stems). The master is
the sample-exact sum of the stems and the stems are mutually decorrelated, so
a time-domain Pearson correlation of each estimate against each stem is an
honest recovery metric — a working vocal estimate must correlate
higher with the true vocal than with the true instrumental.
A. REPET-SIM (repetition-based separation, unsupervised):
decompose.nn_filter(|STFT|, median, cosine, width=2 s) →
element-min with S → decompose.softmask (margins 2/10, power 2)
→ istft with mix phase.
B. fingerprint (pleco flagship, supervised — handed
the true vocal's fingerprints): processAudioToFingerprints
→ optimizeEqCurves → reconstructVocal.
Honest scoreboard (node spot-run 2026-07-02, deterministic — the web page recomputes the identical numbers): the raw mix is instrumental-dominated (corr voc 0.335 / ins 0.947 — the vocal is buried). Both separators flip that ordering, but the supervised fingerprint wins vocal fidelity by a clear margin (corr voc 0.744 vs REPET 0.438), while REPET's background still retains ~65 % of the true instrumental energy. Gate = the fingerprint's vocal-vs-instrumental ordering; REPET's numbers are reported, not hidden.