Skip to content

kmeans

kmeans(X2d, k, options?): object

Defined in: packages/pleco-xa/src/cluster/kmeans.js:34

K-means clustering — Lloyd’s algorithm with greedy k-means++ seeding.

Faithful port of scikit-learn’s sklearn.cluster.KMeans (algorithm=“lloyd”):

  • greedy k-means++ initialization with 2 + floor(log k) local trials (Arthur & Vassilvitskii 2007; sklearn _kmeans_plusplus, _kmeans.py l.180)
  • Lloyd expectation-maximization with strict-label and center-shift tolerance convergence (sklearn _kmeans_single_lloyd, _kmeans.py l.630)
  • nInit independent restarts, keeping the lowest-inertia result
  • dataset-scaled tolerance mean(var(X, axis=0)) * tol (sklearn _tolerance)

Determinism is total: the run is driven by a seeded mulberry32 PRNG. With a fixed seed the labels, centers and inertia are reproducible bit-for-bit; there is deliberately NO Date.now()/Math.random() fallback.

Validated against committed reference fixtures (three separable blobs, k=3, generated by sklearn.cluster.KMeans).

ArrayLike<number>[]

Observations, shape (nSamples, nFeatures). Each row is a plain array or a typed array; all rows must share the same length.

number

number of clusters (1 ≤ k ≤ nSamples).

number = 300

max Lloyd iterations per restart (≥ 1).

number = 10

number of k-means++ restarts (≥ 1).

number = 0

PRNG seed for reproducible seeding.

number = 1e-4

relative center-shift tolerance (scaled by the mean feature variance, matching sklearn).

object

labels[i] is the cluster index of observation i, centers[c] is the centroid of cluster c, and inertia is the summed squared distance of every observation to its assigned centroid.

centers: number[][]

inertia: number

labels: Int32Array