Cluster — deterministic k-means
cluster is pleco-xa’s clustering corner. Today it exposes one function — kmeans — but a
carefully faithful one: Lloyd’s expectation-maximisation with greedy k-means++ seeding,
matched to sklearn.cluster.KMeans down to exact labels and bit-exact inertia. Every run is
driven by a seeded mulberry32 PRNG, so results are reproducible bit-for-bit; there is
deliberately no Date.now()/Math.random() fallback. It is the clustering step under
Laplacian segmentation.
Fixture-gated against cluster.json (three separable blobs, k=3, generated by
sklearn.cluster.KMeans).
Key functions
Section titled “Key functions”Verified against the built barrel (cluster namespace):
kmeans(X2d, k, opts)→{ labels, centers, inertia }.X2dis shape(nSamples, nFeatures)(an array of rows).labelsis anInt32Arrayof cluster indices,centersiskcentroid rows,inertiais the summed squared distance to assigned centroids. Options:nInit(restarts, default 10),maxIter(default 300),seed(default 0),tol(default 1e-4, scaled by the mean feature variance like sklearn).
Example
Section titled “Example”import { cluster } from 'pleco-xa'
const X = [ [0.0, 0.0], [0.1, 0.0], [5.0, 5.0], [5.1, 5.0], [5.0, 5.1],]
const { labels, centers, inertia } = cluster.kmeans(X, 2, { seed: 0 })// labels: Int32Array([0, 0, 1, 1, 1]) — same seed always yields the same result- Determinism is total. With a fixed
seed, labels, centers, and inertia are reproducible bit-for-bit across engines. If you want different initialisations, change theseed— not the wall clock. kmust lie in[1, nSamples], and every value inX2dmust be finite; both violations throw with a diagnostic rather than degrading silently.- Seeding is greedy k-means++ (
2 + floor(log k)local trials per centre) and empty clusters are relocated onto the worst-fit points, matching sklearn’srelocate_empty_clustersbehaviour — that is what keeps labels and inertia exact against the fixture. nInitindependent restarts are run and the lowest-inertia result is kept, exactly as sklearn does.
API reference
Section titled “API reference”Full signature: cluster namespace —
kmeans.