K-Means

The k means using Lloyd's algorithm can be generalized to manifolds, since its first step looks for the closest center of each data point, which can be done in terms of the geodesic distance. The second step of computing the mean within each cluster is generalized to computing the Riemannian center of mass[Karcher1977].

ManifoldML.KMeansOptionsType
KMeansOptions <: Options

Collect the data necessary during computation of the k means clustering, i.e.

  • points::Vector{P} – the given data
  • centers::Vector{P} – the cluster centrs 
  • assignment::Vector{<:Int} – a vector the same length as points assigning each of them to a cluster
  • stop::StoppingCriterion a stoppingCriterion

Here P is a data type for points on the manifold the points (and centers) live on. This manifold is stored in the KMeansProblem.

Constructor

KMeansOptions(
    points::Vector{P},
    centers::Vector{P},
    stop::StoppingCriterion=StoppingCriterion(100)
)

Initialize the options. The assignment is set to zero and initialized at the beginning of the algorithm.

source
ManifoldML.kmeansMethod
kmeans( M::Manifold, pts::Vector{P};
    num_centers=5,
    centers = pts[1:num_centers],
    stop=StopAfterIteration(100),
    kwargs...
)

Compute a simple k-means on a Riemannian manifold M for the points pts. The num_centers defaults to 5 and the initial centers centers are set to the first num_centers data items. The stopping criterion is set by default to 100 iterations.

The kwargs... can be used to initialize RecordOptions or DebugOptions decorators from Manopt.jl

Returns the final KMeansOptions including the final assignment vector and the centers.

source

Literature

  • Karcher1977

    Karcher, H.: Riemannian center of mass and mollifier smoothing, Communications on Pure and Applied Mathematics 30(5), 1977, pp. 509–541. doi: 10.1002/cpa.3160300502