Skip to content

Cluster — deterministic k-means

cluster is pleco-xa’s clustering corner. Today it exposes one function — kmeans — but a carefully faithful one: Lloyd’s expectation-maximisation with greedy k-means++ seeding, matched to sklearn.cluster.KMeans down to exact labels and bit-exact inertia. Every run is driven by a seeded mulberry32 PRNG, so results are reproducible bit-for-bit; there is deliberately no Date.now()/Math.random() fallback. It is the clustering step under Laplacian segmentation.

Fixture-gated against cluster.json (three separable blobs, k=3, generated by sklearn.cluster.KMeans).

Verified against the built barrel (cluster namespace):

  • kmeans(X2d, k, opts){ labels, centers, inertia }. X2d is shape (nSamples, nFeatures) (an array of rows). labels is an Int32Array of cluster indices, centers is k centroid rows, inertia is the summed squared distance to assigned centroids. Options: nInit (restarts, default 10), maxIter (default 300), seed (default 0), tol (default 1e-4, scaled by the mean feature variance like sklearn).
import { cluster } from 'pleco-xa'
const X = [
[0.0, 0.0],
[0.1, 0.0],
[5.0, 5.0],
[5.1, 5.0],
[5.0, 5.1],
]
const { labels, centers, inertia } = cluster.kmeans(X, 2, { seed: 0 })
// labels: Int32Array([0, 0, 1, 1, 1]) — same seed always yields the same result
  • Determinism is total. With a fixed seed, labels, centers, and inertia are reproducible bit-for-bit across engines. If you want different initialisations, change the seed — not the wall clock.
  • k must lie in [1, nSamples], and every value in X2d must be finite; both violations throw with a diagnostic rather than degrading silently.
  • Seeding is greedy k-means++ (2 + floor(log k) local trials per centre) and empty clusters are relocated onto the worst-fit points, matching sklearn’s relocate_empty_clusters behaviour — that is what keeps labels and inertia exact against the fixture.
  • nInit independent restarts are run and the lowest-inertia result is kept, exactly as sklearn does.

Full signature: cluster namespacekmeans.