Skip to content

Glossary

Compact definitions for the domain terms that recur across the guides. Each is scoped to how Pleco-Xa uses it.

STFT (Short-Time Fourier Transform) : The audio split into short overlapping frames, each transformed to the frequency domain. The foundation of nearly every spectral feature — magnitude gives you a spectrogram, and the complex output preserves phase for reconstruction. Pleco-Xa’s stft/istft are fixture-gated (magnitude exact, round-trip verified).

Hop length : The number of samples the STFT window advances between consecutive frames. A smaller hop means more frames (finer time resolution, more computation); a larger hop means fewer. It sets the frame rate of everything downstream — spectrograms, onset envelopes, chroma.

Mel spectrogram : A spectrogram whose frequency axis is warped onto the mel scale, which spaces bands the way human hearing does — roughly linear below ~1 kHz, logarithmic above. The usual input to MFCCs and onset strength.

Chroma : A 12-bin representation that folds all octaves onto the twelve pitch classes (C, C♯, D, …). It captures harmony and melody while ignoring register, which is what makes it useful for structure and recurrence analysis.

Onset strength : A per-frame envelope that peaks when new energy appears — a note attack, a drum hit. Computed from positive spectral flux (frame-to-frame increases in a log-power mel spectrogram). The raw material for beat tracking and tempo.

Tempogram : A time-by-tempo representation showing how strongly each tempo is expressed at each moment — essentially a “spectrogram of rhythm” derived from the onset envelope. Used to estimate and track tempo, including tempo that drifts over time.

Beat tracking : Finding the sequence of beat instants in audio, given (or jointly with) a tempo estimate. Pleco-Xa’s beat_track returns exact frames, pinned by CI fixtures.

HPSS (Harmonic-Percussive Source Separation) : Splitting a spectrogram into a harmonic layer (stable across time — sustained tones) and a percussive layer (stable across frequency — transients), via median filtering along each axis. The masked components sum back to approximately the original.

Recurrence matrix : A frame-by-frame self-similarity map: cell (i, j) marks whether frames i and j are similar (as neighbors, or by affinity). Diagonal stripes reveal repeated sections — the backbone of both structural segmentation and the recurrence loop strategy.

RQA (Recurrence Quantification Analysis) : Quantifying the structure in a recurrence matrix — in particular finding the best diagonal alignment path. Pleco-Xa’s sequence.rqa recovers that path exactly, and the loop recurrence strategy can use an RQA path as a lag candidate.

DTW (Dynamic Time Warping) : An algorithm that finds the lowest-cost alignment between two sequences that may run at different speeds, by warping the time axis. Used to align a performance to a reference. Pleco-Xa’s dtw is bit-exact in cost with an exact warping path.

PCEN (Per-Channel Energy Normalization) : An adaptive gain / dynamic-range compression applied per frequency channel, often used in place of log scaling before onset or event detection. Numerically exact and fixture-gated in CI, and equivalent whether run whole-signal or block-by-block.

Laplacian segmentation : The McFee-Ellis method for finding structural boundaries: build a recurrence graph, take its normalized Laplacian, embed with the smallest eigenvectors, and cluster. Pleco-Xa assembles it from verified eigh, laplacian, and kmeans primitives.

Loop point : The sample-accurate start and end of a region that repeats seamlessly. Finding them — with an honest confidence and clean, click-free boundaries — is Pleco-Xa’s signature capability. See the Loop guide.

Normalized cross-correlation (NCC) : A similarity score in [-1, 1] between two signals after removing their means and normalizing by their standard deviations. Pleco-Xa measures loop confidence by correlating a candidate loop against the audio that follows it — 1 means it repeats verbatim.

Zero crossing : A sample where the waveform crosses zero amplitude. Snapping a loop boundary to the nearest zero crossing removes the click at the seam without moving the point audibly — the job of DynamicZeroCrossing.