A synthetic 220 Hz tone that steps to 330 Hz, then a silent tail
— exactly the pyin reference fixture. pyin(y, 80, 500) builds a
beta/Boltzmann observation matrix over a log-spaced pitch grid + an unvoiced
state block, then Viterbi-decodes it through a
transition_local ⊗ voiced/unvoiced-switch matrix. The white
contour is the decoded f0; green/gray bands are the decoded voicing.
This is the same math the pyin.json reference test gates — voiced
f0 lands grid-exact (<0.1 semitone), and the silence decodes
unvoiced.