An NCRPG run on the Grassmann manifold
Hajg Jasa 2025-05-09
Introduction
In this example we compare the Nonconvex Riemannian Proximal Gradient (NCRPG) method [BJJP25b] with the Cyclic Proximal Point Algorithm, which was introduced in [Bac14], on the space of symmetric positive definite matrices and on hyperbolic space. This example reproduces the results from [BJJP25b], Section 5.4.
using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots
using Random, LinearAlgebra, LRUCache
using ManifoldDiff, Manifolds, Manopt, ManoptExamples
The Problem
Let $\mathcal M$ be a Riemannian manifold and $\{q_1,\ldots,q_N\} \in \mathcal M$ denote $N = 1000$ Gaussian random data points. Let $g \colon \mathcal M \to \mathbb R$ be defined by
\[g(p) = \sum_{j = 1}^N w_j \, \mathrm{dist}(p, q_j)^2,\]
where $w_j$, $j = 1, \ldots, N$ are positive weights such that $\sum_{j = 1}^N w_j = 1$. In our experiments, we choose the weights $w_j = \frac{1}{2N}$. Observe that the function $g$ is strongly convex with respect to the Riemannian metric on $\mathcal M$. The Riemannian geometric median $p^*$ of the dataset $\{q_1,\ldots,q_N\}$
\[\mathcal D = \{ q_1,\ldots,q_N \, \vert \, q_j \in \mathcal M\text{ for all } j = 1,\ldots,N \}\]
is then defined as
\[ p^* \coloneqq \operatorname*{arg\,min}_{p \in \mathcal M} g(p),\]
where equality is justified since $p^*$ is uniquely determined on Hadamard manifolds.
Let now $\bar q \in \cM$ be a given point, and let $h \colon \mathcal M \to \mathbb R$ be defined by
\[h(p) = \alpha \mathrm{dist}(p, \bar q).\]
We define our total objective function as $f = g + h$. Notice that this objective function is strongly convex with respect to the Riemannian metric on $\mathcal M$ thanks to $g$. The goal is to find the minimizer of $f$ on $\mathcal M$, which heuristically is an interpolation between the geometric median $p^*$ and $\bar q$.
Numerical Experiment
We initialize the experiment parameters, as well as some utility functions.
random_seed = 100
experiment_name = "NCRPG-Grassmann"
results_folder = joinpath(@__DIR__, experiment_name)
!isdir(results_folder) && mkdir(results_folder)
atol = 1e-7
max_iters = 5000
N = 1000 # number of data points
α = 1/2 # weight for the median component (h)
δ = 1e-2 # parameter for the estimation of the constant stepsize
σ = 1.0 # standard deviation for the Gaussian data points
k_max_gr = 2.0 # maximum curvature of the Grassmann manifold
gr_dims = [(5, 2), (10, 4), (50, 10), (100, 20), (200, 40)] # dimensions of the Grassmann manifold
# Objective, gradient, and proxes
g(M, p, data) = 1/2length(data) * sum(distance.(Ref(M), data, Ref(p)).^2)
grad_g(M, p, data) = 1/length(data) * sum(ManifoldDiff.grad_distance.(Ref(M), data, Ref(p), 2))
#
h(M, p, q) = α * distance(M, p, q)
prox_h(M, λ, p, q) = ManifoldDiff.prox_distance(M, α * λ, q, p, 1)
#
f(M, p, data, q) = g(M, p, data) + h(M, p, q)
# CPPA needs the proximal operators for the total objective
function proxes_f(data, q)
proxes = Function[(M, λ, p) -> ManifoldDiff.prox_distance(M, λ / length(data), di, p, 2) for di in data]
push!(proxes, (M, λ, p) -> ManifoldDiff.prox_distance(M, α * λ, q, p, 1))
return proxes
end
# Function to shorten vectors if they are too long (for convexity reasons)
function shorten_vectors!(M, p, vectors)
# If the i-th vector is of length greater than π/(2 * √(k_max_gr)), randomly shorten it
# to a length between 0 and π/(2 * √(k_max_gr)) (excluded)
for i in 1:length(vectors)
if norm(M, p, vectors[i]) ≥ π/(2 * √(k_max_gr))
# Randomly shorten the vector to a length between 0 and π/(2 * √(k_max_gr))
new_length = rand() * π/(2 * √(k_max_gr))
vectors[i] = new_length * (vectors[i] / norm(M, p, vectors[i]))
end
end
return vectors
end
# Function to generate points close to the given point p
function close_point(M, p, tol; retraction_method=Manifolds.default_retraction_method(M, typeof(p)))
X = rand(M; vector_at = p)
X .= tol * rand() * X / norm(M, p, X)
return retract(M, p, X, retraction_method)
end
# Functions for estimating the constant stepsize
ζ(δ) = π/((2 + δ) * √k_max_gr) * cot(π/((2 + δ) * √k_max_gr))
function λ_δ(δ, M, h, grad_g, p1, k_max, D, N=100)
points = [close_point(M, p1, D/2) for _ in 1:N]
α_g = maximum([norm(M, p, grad_g(M, p)) for p in points])
α_1 = minimum([h(M, p) for p in points])
α_2 = maximum([h(M, p) for p in points])
π_k = π / √k_max
return (√(4*(α_2 - α_1)^2 + π_k^2/(2+δ)^2 * α_g^2) - 2*(α_2 - α_1))/(2*α_g^2)
end
We introduce some keyword arguments for the solvers we will use in this experiment
pgm_kwargs(initial_stepsize) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stepsize => ProximalGradientMethodBacktracking(;
strategy=:nonconvex,
initial_stepsize=initial_stepsize,
stop_when_stepsize_less=atol,
),
:stopping_criterion => StopWhenAny(
StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
),
]
pgm_bm_kwargs(initial_stepsize) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stepsize => ProximalGradientMethodBacktracking(;
strategy=:nonconvex,
initial_stepsize=initial_stepsize,
stop_when_stepsize_less=atol,
),
:stopping_criterion => StopWhenAny(
StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
),
]
#
pgm_kwargs_constant(stepsize) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stepsize => ConstantLength(stepsize),
:stopping_criterion => StopWhenAny(
StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
),
]
pgm_bm_kwargs_constant(stepsize) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stepsize => ConstantLength(stepsize),
:stopping_criterion => StopWhenAny(
StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
),
]
#
cppa_kwargs(M) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stopping_criterion => StopWhenAny(
StopAfterIteration(max_iters), StopWhenChangeLess(M, atol)
),
]
cppa_bm_kwargs(M) = [
:record => [:Iteration, :Cost, :Iterate],
:return_state => true,
:stopping_criterion => StopWhenAny(
StopAfterIteration(max_iters), StopWhenChangeLess(M, atol)
),
]
Before running the experiments, we initialize data collection functions that we will use later
global col_names_1 = [
:Dimension,
:Time_1,
:Iterations_1,
:Objective_1,
:Time_2,
:Iterations_2,
:Objective_2,
]
col_types_1 = [
Int64,
Float64,
Int64,
Float64,
Float64,
Int64,
Float64,
]
named_tuple_1 = (; zip(col_names_1, type[] for type in col_types_1 )...)
function initialize_dataframes(results_folder, experiment_name, subexperiment_name, named_tuple_1)
A1 = DataFrame(named_tuple_1)
CSV.write(
joinpath(
results_folder,
experiment_name *
"-Comparisons.csv",
),
A1;
header=false,
)
return A1
end
function export_dataframes(M, records, times, results_folder, experiment_name, subexperiment_name, col_names_1)
B1 = DataFrame(;
Dimension=manifold_dimension(M),
Time_1=times[1],
Iterations_1=maximum(first.(records[1])),
Objective_1=minimum([r[2] for r in records[1]]),
Time_2=times[2],
Iterations_2=maximum(first.(records[2])),
Objective_2=minimum([r[2] for r in records[2]]),
)
return B1
end
function write_dataframes(
B1,
results_folder,
experiment_name,
subexperiment_name
)
CSV.write(
joinpath(
results_folder,
experiment_name *
"-Comparisons.csv",
),
B1;
append=true,
)
end
subexperiment_name = "Gr"
global A1_Gr = initialize_dataframes(
results_folder,
experiment_name,
subexperiment_name,
named_tuple_1,
)
for (n, m) in gr_dims
Random.seed!(random_seed)
M = Grassmann(Int(n), Int(m))
anchor = rand(M)
vectors = [rand(M; vector_at=anchor, σ=σ) for _ in 1:N+2]
shorten_vectors!(M, anchor, vectors)
data = [exp(M, anchor, vectors[i]) for i in 1:N]
q_bar = exp(M, anchor, vectors[N+1]) # point for the median component, i.e. h
p0 = exp(M, anchor, vectors[N+2])
g_gr(M, p) = g(M, p, data)
h_gr(M, p) = h(M, p, q_bar)
grad_g_gr(M, p) = grad_g(M, p, data)
proxes_f_gr = proxes_f(data, q_bar)
prox_h_gr(M, λ, p) = prox_h(M, λ, p, q_bar)
f_gr(M, p) = f(M, p, data, q_bar)
D = 2maximum([distance(M, anchor, pt) for pt in vcat(data, [q_bar], [p0])])
L_g = 1.0 # since k_min = 0.0
λ = minimum([λ_δ(δ, M, h_gr, grad_g_gr, anchor, k_max_gr, D, 100) for _ in 1:10]) # estimate λ_δ
constant_stepsize = max(0, min(λ, ζ(δ)/L_g))
initial_stepsize = 1.0
# Optimization
pgm_constant = proximal_gradient_method(M, f_gr, g_gr, grad_g_gr, p0; prox_nonsmooth=prox_h_gr, pgm_kwargs_constant(constant_stepsize)...)
pgm_constant_result = get_solver_result(pgm_constant)
pgm_constant_record = get_record(pgm_constant)
# We can also use a backtracked stepsize
pgm = proximal_gradient_method(M, f_gr, g_gr, grad_g_gr, p0; prox_nonsmooth=prox_h_gr, pgm_kwargs(initial_stepsize)...)
pgm_result = get_solver_result(pgm)
pgm_record = get_record(pgm)
records = [
pgm_constant_record,
pgm_record,
]
if benchmarking
pgm_constant_bm = @benchmark proximal_gradient_method($M, $f_gr, $g_gr, $grad_g_gr, $p0; prox_nonsmooth=$prox_h_gr, $pgm_bm_kwargs_constant($constant_stepsize)...)
pgm_bm = @benchmark proximal_gradient_method($M, $f_gr, $g_gr, $grad_g_gr, $p0; prox_nonsmooth=$prox_h_gr, $pgm_bm_kwargs($initial_stepsize)...)
times = [
median(pgm_constant_bm).time * 1e-9,
median(pgm_bm).time * 1e-9,
]
B1 = export_dataframes(
M,
records,
times,
results_folder,
experiment_name,
subexperiment_name,
col_names_1,
)
append!(A1_Gr, B1)
(export_table) && (write_dataframes(B1, results_folder, experiment_name, subexperiment_name))
end
end
We can take a look at how the algorithms compare to each other in their performance with the following table, where columns 2 to 4 relate to the NCRPG with a constant stepsize, while columns 5 to 7 refer to a backtracked stepsize…
Dimension | Time_1 | Iterations_1 | Objective_1 | Time_2 | Iterations_2 | Objective_2 |
---|---|---|---|---|---|---|
6 | 0.389674 | 90 | 0.853678 | 0.0987565 | 12 | 0.853678 |
24 | 0.583378 | 58 | 0.858344 | 0.180316 | 9 | 0.858344 |
400 | 2.37086 | 36 | 0.868749 | 0.70433 | 6 | 0.868749 |
1600 | 8.63405 | 35 | 0.871773 | 4.17984 | 6 | 0.871773 |
6400 | 33.6688 | 34 | 0.873426 | 9.63182 | 5 | 0.873426 |
Technical details
This tutorial is cached. It was last run on the following package versions.
using Pkg
Pkg.status()
Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
[6e4b80f9] BenchmarkTools v1.6.0
[336ed68f] CSV v0.10.15
[13f3f980] CairoMakie v0.15.3
[0ca39b1e] Chairmarks v1.3.1
[35d6a980] ColorSchemes v3.30.0
⌅ [5ae59095] Colors v0.12.11
[a93c6f00] DataFrames v1.7.0
[7073ff75] IJulia v1.29.0
[682c06a0] JSON v0.21.4
[8ac3fa9e] LRUCache v1.6.2
[b964fa9f] LaTeXStrings v1.4.0
[d3d80556] LineSearches v7.4.0
[ee78f7c6] Makie v0.24.3
[af67fdf4] ManifoldDiff v0.4.4
[1cead3c2] Manifolds v0.10.22
[3362f125] ManifoldsBase v1.2.0
[0fc0a36d] Manopt v0.5.20
[5b8d5e80] ManoptExamples v0.1.14 `..`
[51fcb6bd] NamedColors v0.2.3
⌃ [91a5bcdd] Plots v1.40.16
[08abe8d2] PrettyTables v2.4.0
⌃ [6099a3de] PythonCall v0.9.25
[f468eda6] QuadraticModels v0.9.13
[1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
This tutorial was last rendered July 16, 2025, 13:37:16.
Literature
- [Bac14]
- M. Bačák. Computing medians and means in Hadamard spaces. SIAM Journal on Optimization 24, 1542–1566 (2014), arXiv:1210.2145.
- [BJJP25b]
- R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Nononvex Optimization, preprint (2025), arXiv:2506.09775.