Optimize on your own manifold

Ronny Bergmann

When you have used a few solvers from Manopt.jl for example like in the opening tutorial 🏔️ Get started: optimize! and also familiarized yourself with how to work with manifolds in general at 🚀 Get Started with Manifolds.jl, you might come across the point that you want to implementing a manifold yourself and use it within Manopt.jl. A challenge might be, which functions are necessary, since the overall interface of ManifoldsBase.jl is maybe not completely necessary.

This tutorial aims to help you through these steps to implement necessary parts of a manifold to get started with the solver you have in mind.

An example problem

We get started by loading the packages we need.

using LinearAlgebra, Manifolds, ManifoldsBase, Random
using Manopt
Random.seed!(42)

We also define the same manifold as in the implementing a manifold tutorial.

"""
    ScaledSphere <: AbstractManifold{ℝ}

Define a sphere of fixed radius

# Fields

* `dimension` dimension of the sphere
* `radius` the radius of the sphere

# Constructor

    ScaledSphere(dimension,radius)

Initialize the manifold to a certain `dimension` and `radius`,
which by default is set to `1.0`
"""
struct ScaledSphere <: AbstractManifold{ℝ}
    dimension::Int
    radius::Float64
end

We would like to compute a mean and/or median similar to 🏔️ Get started: optimize!. For given a set of points $q_1,\ldots,q_n$ we want to compute [Kar77]

\[ \operatorname*{arg\,min}_{p∈\mathcal M} \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i)\]

On the ScaledSphere we just defined above. We define a few parameters first

d = 5  # dimension of the sphere - embedded in R^{d+1}
r = 2.0 # radius of the sphere
N = 100 # data set size

M = ScaledSphere(d,r)
ScaledSphere(5, 2.0)

If we generate a few points

# generate 100 points around the north pole
pts = [ [zeros(d)..., M.radius] .+ 0.5.*([rand(d)...,0.5] .- 0.5) for _=1:N]
# project them onto the r-sphere
pts = [ r/norm(p) .* p for p in pts]

Then, before starting with optimization, we need the distance on the manifold, to define the cost function, as well as the logarithmic map to defined the gradient. For both, we here use the “lazy” approach of using the Sphere as a fallback. Finally, we have to provide information about how points and tangent vectors are stored on the manifold by implementing their representation_size function, which is often required when allocating memory. While we could

import ManifoldsBase: distance, log, representation_size
function distance(M::ScaledSphere, p, q)
    return M.radius * distance(Sphere(M.dimension), p ./ M.radius, q ./ M.radius)
end
function log(M::ScaledSphere, p, q)
    return M.radius * log(Sphere(M.dimension), p ./ M.radius, q ./ M.radius)
end
representation_size(M::ScaledSphere) = (M.dimension+1,)

Define the cost and gradient

f(M, q) = sum(distance(M, q, p)^2 for p in pts)
grad_f(M,q) = sum( - log(M, q, p) for p in pts)

Defining the necessary functions to run a solver

The documentation usually lists the necessary functions in a section “Technical Details” close to the end of the documentation of a solver, for our case that is The gradient descent’s Technical Details,

They list all details, but we can start even step by step here if we are a bit careful.

A retraction

We first implement a retraction. Informally, given a current point and a direction to “walk into” we need a function that performs that walk. Since we take an easy one that just projects onto the sphere, we use the ProjectionRetraction type. To be precise, we have to implement the in-place variantretract_project!

import ManifoldsBase: retract_project!
function retract_project!(M::ScaledSphere, q, p, X, t::Number)
    q .= p .+ t .* X
    q .*= M.radius / norm(q)
    return q
end
retract_project! (generic function with 18 methods)

The other two technical remarks refer to the step size and the stopping criterion, so if we set these to something simpler, we should already be able to do a first run.

We have to specify

  • that we want to use the new retraction,
  • a simple step size and stopping criterion

We start with a certain point of cost

p0 = [zeros(d)...,1.0]
f(M,p0)
444.60374551157634

Then we can run our first solver, where we have to overwrite a few defaults, which would use functions we do not (yet) have. We will discuss these in the next steps.

q1 = gradient_descent(M, f, grad_f, p0;
    retraction_method = ProjectionRetraction(),   # state, that we use the retraction from above
    stepsize = DecreasingStepsize(M; length=1.0), # A simple step size
    stopping_criterion = StopAfterIteration(10),  # A simple stopping crtierion
    X = zeros(d+1),                               # how we define/represent tangent vectors
)
f(M,q1)
162.4000287847332

We at least see, that the function value decreased.

Norm and maximal step size.

To use more advanced stopping criteria and step sizes we first need an inner(M, p, X). We also need a max_stepsize(M), to avoid having too large steps on positively curved manifolds like our scaled sphere in this example

import ManifoldsBase: inner
import Manopt: max_stepsize
inner(M::ScaledSphere, p, X,Y) = dot(X,Y) # inherited from the embedding
 # set the maximal allowed stepsize to injectivity radius.
Manopt.max_stepsize(M::ScaledSphere) = M.radius*π

Then we can use the default step size (ArmijoLinesearch) and the default stopping criterion, which checks for a small gradient Norm

q2 = gradient_descent(M, f, grad_f, p0;
    retraction_method = ProjectionRetraction(), # as before
    X = zeros(d+1), # as before
)
f(M, q2)
9.772830131357024

Making life easier: default retraction and zero vector

To initialize tangent vector memory, the function zero_vector(M,p) is called. Similarly, the most-used retraction is returned by default_retraction_method

We can use both here, to make subsequent calls to the solver less verbose. We define

import ManifoldsBase: zero_vector, default_retraction_method
zero_vector(M::ScaledSphere, p) = zeros(M.dimension+1)
default_retraction_method(M::ScaledSphere) = ProjectionRetraction()
default_retraction_method (generic function with 20 methods)

and now we can even just call

q3 = gradient_descent(M, f, grad_f, p0)
f(M, q3)
9.772830131357024

But we for example automatically also get the possibility to obtain debug information like

gradient_descent(M, f, grad_f, p0; debug = [:Iteration, :Cost, :Stepsize, 25, :GradientNorm, :Stop, "\n"]);
Initial f(x): 444.603746
# 25    f(x): 9.772836s:0.01851892669284175|grad f(p)|:0.027709997662907597
# 50    f(x): 9.772830s:0.01851892669284175|grad f(p)|:0.00035050862606229735
# 75    f(x): 9.772830s:0.01671333134028968|grad f(p)|:3.5186467891885363e-6
The algorithm reached approximately critical point after 81 iterations; the gradient norm (8.510619620990863e-9) is less than 1.0e-8.

see How to Print Debug Output for more details.

Literature

[Kar77]
H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Mathematics 30, 509–541 (1977).