A Geodesically Convex Example on SPDs

Hajg Jasa 2025-04-16

Introduction

In this example we compare the Convex Riemannian Proximal Gradient (CRPG) method [BJJP25a] with the Cyclic Proximal Point Algorithm, which was introduced in [Bac14], on the space of symmetric positive definite matrices. This example reproduces the results from [BJJP25a], Section 5.3.

using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots, LaTeXStrings, CairoMakie, Chairmarks
using Random, LinearAlgebra, LRUCache
using ManifoldDiff, Manifolds, Manopt, ManoptExamples
import ColorSchemes.tol_vibrant

The Problem

Let $\mathcal M = \mathcal H^2$ be the $2$-dimensional hyperbolic space.

Let $g \colon \mathcal M \to \mathbb R$ be defined by

\[g(p) = \log(\det(p))^4.\]

Observe that the function $g$ is geodesically convex with respect to the Riemannian metric on $\mathcal M$.

Let now $q_1 \neq q_1$ be a given point, and let $h \colon \mathcal M \to \mathbb R$ be defined by

\[h(p) = \tau \mathrm{dist}(p, q_1),\]

for some $\tau > 0$. We define our total objective function as $f = g + h$. Notice that this objective function is also geodesically convex with respect to the Riemannian metric on $\mathcal M$. The goal is to find the minimizer of $f$ on $\mathcal M$, which is an interpolation between the two points $q_1$ and $q_1$, depending on the value of $\tau$. Namely, if $\tau < 1$, the minimizer is $q_1$; if $\tau > 1$, the minimizer is $q_1$; and if $\tau = 1$, the minimizer is the geodesic segment between $q_1$ and $q_1$.

Numerical Experiment

We initialize the experiment parameters, as well as some utility functions.

random_seed = 42

atol = 1e-7
max_iters = 20000
τ = 1/2 # weight for the component h
spd_dims = [2, 3, 4, 5]
# Objective, gradient, and proxes
g(M, p) = log(det(p))^4
grad_g(M, p) = 4log(det(p))^3 * p
# 
h(M, p, q) = τ * distance(M, p, q)
prox_h(M, λ, p, q) = ManifoldDiff.prox_distance(M, τ * λ, q, p, 1)
# 
f(M, p, q) = g(M, p) + h(M, p, q)
# Function to generate points close to the given point p
function close_point(M, p, tol; retraction_method=Manifolds.default_retraction_method(M, typeof(p)))
    X = rand(M; vector_at = p)
    X .= tol * rand() * X / norm(M, p, X)
    return retract(M, p, X, retraction_method)
end
# Estimate Lipschitz constant of the gradient of g
function theoretical_lipschitz_constant(M, anchor, n, R=D, N=100_000)
    constants = []
    for i in 1:N
        p = close_point(M, anchor, R)

        push!(constants, 12 * n * log(det(p))^2)
    end
    return maximum(constants)
end

We introduce some keyword arguments for the solvers we will use in this experiment

# Solver arguments for backtracking
pgm_kwargs(contraction_factor, initial_stepsize, warm_start_factor) = [
    :record => [:Iteration, :Cost, :Iterate],
    :return_state => true,
    :stepsize => ProximalGradientMethodBacktracking(; 
        contraction_factor=contraction_factor,
        strategy=:convex, 
        initial_stepsize=initial_stepsize,
        stop_when_stepsize_less=atol,
        warm_start_factor=warm_start_factor,
    ),
    :stopping_criterion => StopWhenAny(
        StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
    ),
]
pgm_bm_kwargs(contraction_factor, initial_stepsize, warm_start_factor) = [
    :record => [:Iteration, :Cost, :Iterate],
    :return_state => true,
    :stepsize => ProximalGradientMethodBacktracking(;
        contraction_factor=contraction_factor, 
        strategy=:convex,   
        initial_stepsize=initial_stepsize,
        stop_when_stepsize_less=atol,
        warm_start_factor=warm_start_factor,
    ),
    :stopping_criterion => StopWhenAny(
        StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
    ), 
]
# Solver arguments for constant stepsize 
pgm_kwargs_constant(stepsize) = [
    :record => [:Iteration, :Cost, :Iterate],
    :return_state => true,
    :stepsize => ConstantLength(stepsize),
    :stopping_criterion => StopWhenAny(
        StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
    ),
]
pgm_bm_kwargs_constant(stepsize) = [
    :record => [:Iteration, :Cost, :Iterate],
    :return_state => true,
    :stepsize => ConstantLength(stepsize),
    :stopping_criterion => StopWhenAny(
        StopWhenGradientMappingNormLess(atol), StopAfterIteration(max_iters)
    ), 
]

Before running the experiments, we initialize data collection functions that we will use later

# Header for the dataframe
global col_names_1 = [
    :Dimension,
    :Iterations_1,
    :Time_1,
    :Cost_1,
    :Iterations_2,
    :Time_2,
    :Cost_2,
]
col_types_1 = [
    Int64,
    Int64,
    Float64,
    Float64,
    Int64,
    Float64,
    Float64,
]
named_tuple_1 = (; zip(col_names_1, type[] for type in col_types_1 )...)
# Function for initializing the dataframe
function initialize_dataframes(
    results_folder, 
    experiment_name, 
    named_tuple_1,
)
    A1 = DataFrame(named_tuple_1)
    CSV.write(
        joinpath(
            results_folder,
            experiment_name * 
            "-Comparisons.csv",
        ),
        A1;
        header=false,
    )
    return A1
end
function export_dataframes(
    M, 
    records, 
    times, 
)
    B1 = DataFrame(;
        Dimension=manifold_dimension(M),
        Iterations_1=maximum(first.(records[1])),
        Time_1=times[1],
        Cost_1=minimum([r[2] for r in records[1]]),
        Iterations_2=maximum(first.(records[2])),
        Time_2=times[2],
        Cost_2=minimum([r[2] for r in records[2]]),
    )
    return B1
end
function write_dataframes(
    B1, 
    results_folder, 
    experiment_name, 
)
    CSV.write(
        joinpath(
            results_folder,
            experiment_name *
            "-Comparisons.csv",
        ),
        B1;
        append=true,
    )
end
global A1_SPD = initialize_dataframes(
    results_folder,
    experiment_name,
    named_tuple_1,
)
stats = Dict(:CRPG_CN => Dict(), :CRPG_BT => Dict())
for n in spd_dims

    Random.seed!(random_seed)

    M = SymmetricPositiveDefinite(Int(n))
    q = rand(M)
    p0 = rand(M)

    prox_h_spd(M, λ, p) = prox_h(M, λ, p, q)
    f_spd(M, p) = f(M, p, q)

    D = 4*distance(M, p0, q)
    # Conseravative estimate of the Lipschitz constant for grad_g
    L_g = 1.05 * theoretical_lipschitz_constant(M, p0, n, D/2)
    constant_stepsize = 1/L_g
    initial_stepsize = 3/2 * constant_stepsize
    contraction_factor = 0.9
    warm_start_factor = 2.0

    # Optimization
    pgm_constant = proximal_gradient_method(M, f_spd, g, grad_g, p0;
        prox_nonsmooth=prox_h_spd, 
        pgm_bm_kwargs_constant(constant_stepsize)...
    )
    pgm_constant_result = get_solver_result(pgm_constant)
    pgm_constant_record = get_record(pgm_constant) 
    stats[:CRPG_CN][n] = Dict()
    stats[:CRPG_CN][n][:Iteration] = length(get_record(pgm_constant, :Iteration)) + 1
    stats[:CRPG_CN][n][:Cost] = get_record(pgm_constant, :Iteration, :Cost)
    pushfirst!(stats[:CRPG_CN][n][:Cost], f_spd(M, p0))

    # We can also use a backtracked stepsize
    pgm = proximal_gradient_method(M, f_spd, g, grad_g, p0; 
        prox_nonsmooth=prox_h_spd,
        pgm_kwargs(contraction_factor, initial_stepsize, warm_start_factor)...
    )
    pgm_result = get_solver_result(pgm)
    pgm_record = get_record(pgm)
    stats[:CRPG_BT][n] = Dict()
    stats[:CRPG_BT][n][:Iteration] = length(get_record(pgm, :Iteration)) + 1 
    stats[:CRPG_BT][n][:Cost] = get_record(pgm, :Iteration, :Cost)
    pushfirst!(stats[:CRPG_BT][n][:Cost], f_spd(M, p0))

    records = [
        pgm_constant_record,
        pgm_record,
    ]

    # Benchmarking
    if benchmarking
        pgm_constant_bm = @benchmark proximal_gradient_method($M, $f_spd, $g, $grad_g, $p0; 
            prox_nonsmooth=$prox_h_spd,
            $pgm_bm_kwargs_constant($constant_stepsize)...
        )
        pgm_bm = @benchmark proximal_gradient_method($M, $f_spd, $g, $grad_g, $p0; 
            prox_nonsmooth=$prox_h_spd,
            $pgm_bm_kwargs($contraction_factor, $initial_stepsize, $warm_start_factor)...
        )
        
        times = [
            median(pgm_constant_bm).time * 1e-9,
            median(pgm_bm).time * 1e-9,
        ]
        # Export the results
        B1 = export_dataframes(
            M,
            records,
            times,
        )
        append!(A1_SPD, B1)
        (export_table) && (write_dataframes(B1, results_folder, experiment_name))
    end
end

We can take a look at how the algorithms compare to each other in their performance with the following table, where columns 2 to 4 relate to CRPG with a constant stepsize, while columns 5 to 7 refer to the backtracked case…

DimensionIterations_1Time_1Cost_1Iterations_2Time_2Cost_2
33670.004151130.185932500.008888710.18593
619440.04393920.2707813130.07959540.27078
1086400.2721770.37127458780.5227860.371274
15155350.6184040.449625129373.067660.449625

Lastly, we showcase the rate of decay of the function values for $n = 2$.

function plot_convergence(
    stats,
    dimensions=[2],
)
    # Extract the results for the specified dimensions
    crpg_cn = [stats[:CRPG_CN][d] for d in dimensions]
    crpg_bt = [stats[:CRPG_BT][d] for d in dimensions]

    figs = Vector{Figure}()

    for i in 1:length(crpg_cn)

        # Get the solvers' results and records
        # crpg_cn_result = crpg_cn[i][:Result]
        crpg_cn_record = crpg_cn[i][:Cost]

        # crpg_bt_result = crpg_bt[i][:Result]
        crpg_bt_record = crpg_bt[i][:Cost]


        # Calculate the minimum cost for relative error
        min_cost_crpg_cn = minimum(crpg_cn_record)
        min_cost_crpg_bt = minimum(crpg_bt_record)

        # Create vectors for plotting
        iterations_crpg_cn = 1:crpg_cn[i][:Iteration]
        iterations_crpg_bt = 1:crpg_bt[i][:Iteration]

        # Get initial error for scaling reference lines
        relative_errors_crpg_cn = max.(crpg_cn_record .- min_cost_crpg_cn, 0)
        initial_error_crpg_cn = relative_errors_crpg_cn[1]

        relative_errors_crpg_bt = max.(crpg_bt_record .- min_cost_crpg_bt, 0)
        initial_error_crpg_bt = relative_errors_crpg_bt[1]

        initial_error = max(
            initial_error_crpg_cn,
            initial_error_crpg_bt,
        )
        
        iterations = max(
            iterations_crpg_cn,
            iterations_crpg_bt,
        )

        # Create reference trajectories
        # O(1/√k)
        ref_rate_sqrt = [initial_error/√k for k in iterations]
        # O(1/k)
        ref_rate_1 = [initial_error/k for k in iterations]
        # O(1/k²)
        ref_rate_2 = [initial_error/k^2 for k in iterations]
        # O(1/2^k)
        ref_rate_2k = [initial_error/2^k for k in iterations]

        # Create the convergence plot
        fig = Figure()
        ax = Axis(fig[1, 1],
            title =L"\mathcal{P}(%$(dimensions[i]))",
            xlabel =L"\text{Iterations }(k)",
            ylabel =L"f(p_k) - f_*",
            # xscale=log10,
            yscale=log10,
            yminorticksvisible = true, 
            yminorgridvisible = true,
            yminorticks = IntervalsBetween(8),
        )
        lines!(
            ax,
            iterations_crpg_cn,
            abs.(relative_errors_crpg_cn);
            label="CRPG, constant step",
            linewidth=2,
            color=tol_vibrant[1],
        )
        lines!(
            ax,
            iterations_crpg_bt,
            abs.(relative_errors_crpg_bt);
            label="CRPG, backtracked step",
            linewidth=2,
            color=tol_vibrant[5],
        )
        lines!(
            ax,
            iterations,
            abs.(ref_rate_1);
            linestyle=:dash,
            linewidth=1.5,
            color=:blue,
            label=L"\mathcal O\left(k^{-1}\right)"
        )
        lines!(
            ax,
            iterations,
            abs.(ref_rate_2k);
            linestyle=:dot,
            linewidth=1.5,
            color=:black,
            label=L"\mathcal O\left(2^{-k}\right)"
        )
        fig[1, 2] = Legend(fig, ax, framevisible = true)
        fig

        push!(figs, fig)
    end
    return figs
end
figs = plot_convergence(stats)
for fig in figs
    display(fig)
end

This is in line with the convergence rates of the CRPG method in the geodesically convex setting, as shown in [BJJP25a], Theorem 4.7.

Technical details

This tutorial is cached. It was last run on the following package versions.

using Pkg
Pkg.status()
Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.0
  [336ed68f] CSV v0.10.15
  [13f3f980] CairoMakie v0.15.3
  [0ca39b1e] Chairmarks v1.3.1
  [35d6a980] ColorSchemes v3.30.0
⌅ [5ae59095] Colors v0.12.11
  [a93c6f00] DataFrames v1.7.0
  [7073ff75] IJulia v1.29.0
  [682c06a0] JSON v0.21.4
  [8ac3fa9e] LRUCache v1.6.2
  [b964fa9f] LaTeXStrings v1.4.0
  [d3d80556] LineSearches v7.4.0
  [ee78f7c6] Makie v0.24.3
  [af67fdf4] ManifoldDiff v0.4.4
  [1cead3c2] Manifolds v0.10.22
  [3362f125] ManifoldsBase v1.2.0
  [0fc0a36d] Manopt v0.5.20 `../../Manopt.jl`
  [5b8d5e80] ManoptExamples v0.1.14 `..`
  [51fcb6bd] NamedColors v0.2.3
⌃ [91a5bcdd] Plots v1.40.16
  [08abe8d2] PrettyTables v2.4.0
⌃ [6099a3de] PythonCall v0.9.25
  [f468eda6] QuadraticModels v0.9.13
  [1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

This tutorial was last rendered July 16, 2025, 16:47:53.

Literature

[Bac14]
M. Bačák. Computing medians and means in Hadamard spaces. SIAM Journal on Optimization 24, 1542–1566 (2014), arXiv:1210.2145.
[BJJP25a]
R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Convex Optimization, preprint (2025), arXiv:2507.16055.