K-Means
The k means using Lloyd's algorithm can be generalized to manifolds, since its first step looks for the closest center of each data point, which can be done in terms of the geodesic distance. The second step of computing the mean within each cluster is generalized to computing the Riemannian center of mass[Karcher1977].
ManifoldML.KMeansOptions — TypeKMeansOptions <: OptionsCollect the data necessary during computation of the k means clustering, i.e.
points::Vector{P}– the given datacenters::Vector{P}– the cluster centrsassignment::Vector{<:Int}– a vector the same length aspointsassigning each of them to a clusterstop::StoppingCriteriona stoppingCriterion
Here P is a data type for points on the manifold the points (and centers) live on. This manifold is stored in the KMeansProblem.
Constructor
KMeansOptions(
points::Vector{P},
centers::Vector{P},
stop::StoppingCriterion=StoppingCriterion(100)
)Initialize the options. The assignment is set to zero and initialized at the beginning of the algorithm.
ManifoldML.KMeansProblem — TypeKMeansProblem <: ProblemStore the fixed data necessary for kmeans, i.e. only a Manifold M.
ManifoldML.kmeans — Methodkmeans( M::Manifold, pts::Vector{P};
num_centers=5,
centers = pts[1:num_centers],
stop=StopAfterIteration(100),
kwargs...
)Compute a simple k-means on a Riemannian manifold M for the points pts. The num_centers defaults to 5 and the initial centers centers are set to the first num_centers data items. The stopping criterion is set by default to 100 iterations.
The kwargs... can be used to initialize RecordOptions or DebugOptions decorators from Manopt.jl
Returns the final KMeansOptions including the final assignment vector and the centers.
Literature
- Karcher1977
Karcher, H.: Riemannian center of mass and mollifier smoothing, Communications on Pure and Applied Mathematics 30(5), 1977, pp. 509–541. doi: 10.1002/cpa.3160300502