Hyperparameter optimization

Mateusz Baran 2024-08-03

Introduction

This example shows how to automatically select the best values of hyperparameters of optimization procedures such as retraction, vector transport, size of memory in L-BFGS or line search coefficients. Hyperparameter optimization relies on the Optuna [ASY+19] Python library because it is much more advanced than similar Julia projects, offering Bayesian optimization with conditional hyperparameters and early stopping.

General definitions

Here are some general definitions that you will most likely be able to directly use for your problem without any changes. Just remember to install optuna, for example using CondaPkg Julia library.

using Manifolds, Manopt
using PythonCall
using BenchmarkTools
using LineSearches

# This script requires optuna to be available through PythonCall
# You can install it for example using
# using CondaPkg
# ]conda add optuna
# this has to be done in the Julia environment in which the notebook is rendered
# *before* running the notebook

optuna = pyimport("optuna")

norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)

# TTsuggest_ structs collect data from a calibrating optimization run
# that is handled by compute_pruning_losses function

struct TTsuggest_int
    suggestions::Dict{String,Int}
end
function (s::TTsuggest_int)(name::String, a, b)
    return s.suggestions[name]
end
struct TTsuggest_float
    suggestions::Dict{String,Float64}
end
function (s::TTsuggest_float)(name::String, a, b; log::Bool=false)
    return s.suggestions[name]
end
struct TTsuggest_categorical
    suggestions::Dict{String,Any}
end
function (s::TTsuggest_categorical)(name::String, vals)
    return s.suggestions[name]
end
struct TTreport
    reported_vals::Vector{Float64}
end
function (r::TTreport)(val, i)
    return push!(r.reported_vals, val)
end
struct TTshould_prune end
(::TTshould_prune)() = Py(false)
struct TracingTrial
    suggest_int::TTsuggest_int
    suggest_float::TTsuggest_float
    suggest_categorical::TTsuggest_categorical
    report::TTreport
    should_prune::TTshould_prune
end

function compute_pruning_losses(
    od,
    int_suggestions::Dict{String,Int},
    float_suggestions::Dict{String,Float64},
    categorical_suggestions::Dict{String,Int},
)
    tt = TracingTrial(
        TTsuggest_int(int_suggestions),
        TTsuggest_float(float_suggestions),
        TTsuggest_categorical(categorical_suggestions),
        TTreport(Float64[]),
        TTshould_prune(),
    )
    od(tt)
    return tt.report.reported_vals
end

The next part is your hyperparameter optimization objective. The ObjectiveData struct contains all relevant information about the sequence of specific problems. The outermost key part is the N_range field. Early stopping requires a series of progressively more complex problems. They will be attempted from the most simple one to the most complex one, and are specified by the values of N in that vector.

mutable struct ObjectiveData{TObj,TGrad}
    obj::TObj
    grad::TGrad
    N_range::Vector{Int}
    gtol::Float64
    vts::Vector{AbstractVectorTransportMethod}
    retrs::Vector{AbstractRetractionMethod}
    manifold_constructors::Vector{Tuple{String,Any}}
    pruning_losses::Vector{Float64}
    manopt_stepsize::Vector{Tuple{String,Any}}
    obj_loss_coeff::Float64
end

In the example below we optimize hyperparameters on a sequence of Rosenbrock-type problems restricted to spheres:

\[\arg\min_{p \in S^{N-1}} \sum_{i=1}^{N/2} (1-p_{2i})^2 + 100 (p_{2i+1} - p_{2i}^2)^2,\]

where $N \in [2, 16, 128, 1024, 8192, 65536]$.

obj and grad are the objective and gradient, here defined as below. Note that gradient works in-place and variants without manifolds are also provided for easier comparison with other libraries like Optim.jl. It is easiest when problems for different values N can be distinguished by being defined on successively larger manifolds but the script could be modified so that it’s not necessary.

pruning_losses and compute_pruning_losses are related to early pruning used in Optuna and you shouldn’t have to modify them.

function f_rosenbrock(x)
    result = 0.0
    for i in 1:2:length(x)
        result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
    end
    return result
end
function f_rosenbrock(::AbstractManifold, x)
    return f_rosenbrock(x)
end

function g_rosenbrock!(storage, x)
    for i in 1:2:length(x)
        storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
        storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
    end
    return storage
end
function g_rosenbrock!(M::AbstractManifold, storage, x)
    g_rosenbrock!(storage, x)
    riemannian_gradient!(M, storage, x, storage)
    return storage
end

Next, gtol is the tolerance used for the stopping criterion in optimization. vts and retrs are, respectively, vector transports and retraction methods selected through hyperparameter optimization. Some items need to be different for different values of N, for example the manifold over which the problem is defined. This is handled by manifold_constructors which is then defined as Tuple{String,Any}[("Sphere", N -> Manifolds.Sphere(N - 1))], where the string "Sphere" is used to identify the manifold family and the next element is a function that transforms the value of N to the manifold for the problem of size N.

Similarly, different stepsize selection methods may be considered. This is handled by the field manopt_stepsize. It will be easiest to see how it works by looking at how it is initialized:

Tuple{String,Any}[
    ("LS-HZ", M -> Manopt.LineSearchesStepsize(ls_hz)),
    ("Wolfe-Powell", (M, sufficient_decrease, sufficient_curvature) -> Manopt.WolfePowellLinesearch(; sufficient_decrease = sufficient_decrease, sufficient_curvature = sufficient_curvature)),
]

We have a string that identifies the line search method name and a constructor of the line search which takes relevant arguments like the manifold or a numerical parameter.

The next part is the trial evaluation procedure. This is one of the more important places which need to be customized to your problem. This is the point where we tell Optuna about the relevant optimization hyperparameters and use them to define specific problems. The hyperparameter optimization is a multiobjective problem: we want as good problem objectives as possible and as low times as possible. As Optuna doesn’t currently support multicriteria pruning, which is important for obtaining a solution in a reasonable amount of time, we use a linear combination of sub-objectives to turn the problem into a single-criterion optimization. The hyperparameter optimization objective is a linear combination of achieved objectives the relative weight is controlled by objective.obj_loss_coeff.


function (objective::ObjectiveData)(trial)
    # Here we use optuna to select memory length for L-BFGS -- an integer in the range between 2 and 30, referenced by name "mem_len"
    mem_len = trial.suggest_int("mem_len", 2, 30)

    # Here we select a vector transport and retraction methods, one of those specified in the `ObjectiveData`.
    vt = objective.vts[pyconvert(
        Int,
        trial.suggest_categorical(
            "vector_transport_method", Vector(eachindex(objective.vts))
        ),
    )]
    retr = objective.retrs[pyconvert(
        Int,
        trial.suggest_categorical("retraction_method", Vector(eachindex(objective.retrs))),
    )]

    # Here we select the manifold constructor, in case we want to try different manifolds for our problem. For example one could try defining a problem with orthogonality constraints on Stiefel, Grassmann or flag manifold.
    manifold_name, manifold_constructor = objective.manifold_constructors[pyconvert(
        Int,
        trial.suggest_categorical(
            "manifold", Vector(eachindex(objective.manifold_constructors))
        ),
    )]

    # Here the stepsize selection method type is selected.
    manopt_stepsize_name, manopt_stepsize_constructor = objective.manopt_stepsize[pyconvert(
        Int,
        trial.suggest_categorical(
            "manopt_stepsize", Vector(eachindex(objective.manopt_stepsize))
        ),
    )]

    # This parametrizes stepsize selection methods with relevant numerical parameters.
    local sufficient_decrease_val, sufficient_curvature_val, hz_sigma
    if manopt_stepsize_name == "Wolfe-Powell"
        sufficient_decrease_val = pyconvert(
            Float64, trial.suggest_float("Wolfe-Powell sufficient_decrease", 1e-5, 1e-2; log=true)
        )
        sufficient_curvature_val =
            1.0 - pyconvert(
                Float64, trial.suggest_float("Wolfe-Powell 1-sufficient_curvature", 1e-4, 1e-2; log=true)
            )
    elseif manopt_stepsize_name == "Improved HZ"
        hz_sigma = pyconvert(Float64, trial.suggest_float("Improved HZ sigma", 0.1, 0.9))
    end

    # The current loss estimate, taking into account estimated loss values for larger, not-yet-evaluated values of `N`.
    loss = sum(objective.pruning_losses)

    # Here iterate over problems we want to optimize for
    # from smallest to largest; pruning should stop the iteration early
    # if the hyperparameter set is not promising
    cur_i = 0
    for N in objective.N_range
        # Here we define the initial point for the optimization procedure
        p0 = zeros(N)
        p0[1] = 1
        M = manifold_constructor(N)
        # Here we construct the specific line search to be used
        local ls
        if manopt_stepsize_name == "Wolfe-Powell"
            ls = manopt_stepsize_constructor(M, sufficient_decrease_val, sufficient_curvature_val)
        elseif manopt_stepsize_name == "Improved HZ"
            ls = manopt_stepsize_constructor(M, hz_sigma)
        else
            ls = manopt_stepsize_constructor(M)
        end
        manopt_time, manopt_iters, manopt_obj = benchmark_time_state(
            ManoptQN(),
            M,
            N,
            objective.obj,
            objective.grad,
            p0,
            ls,
            pyconvert(Int, mem_len),
            objective.gtol;
            vector_transport_method=vt,
            retraction_method=retr,
        )
        # TODO: turn this into multi-criteria optimization when Optuna starts supporting
        # pruning in such problems
        loss -= objective.pruning_losses[cur_i + 1]
        loss += manopt_time + objective.obj_loss_coeff * manopt_obj
        trial.report(loss, cur_i)
        if pyconvert(Bool, trial.should_prune().__bool__())
            throw(PyException(optuna.TrialPruned()))
        end
        cur_i += 1
    end
    return loss
end

In the following benchmarking code you will most likely have to adapt solver parameters. This is designed around quasi_Newton but can be adapted to any solver as needed. The example below performs a small number of trials for quick rendering but in practice you should aim for at least a few thousand trials (the n_trials parameter).


# An abstract type in case we want to try different optimization packages.
abstract type AbstractOptimConfig end
struct ManoptQN <: AbstractOptimConfig end

# Benchmark that evaluates hyperparameters. Returns time to reach the solution, number of iterations and final value of the objective.
function benchmark_time_state(
    ::ManoptQN,
    M::AbstractManifold,
    N,
    f,
    g!,
    p0,
    stepsize::Union{Manopt.Stepsize,Manopt.ManifoldDefaultsFactory{<:Manopt.Stepsize}},
    mem_len::Int,
    gtol::Real;
    kwargs...,
)
    manopt_sc = StopWhenGradientNormLess(gtol; norm=norm_inf) | StopAfterIteration(1000)
    mem_len = min(mem_len, manifold_dimension(M))
    manopt_state = quasi_Newton(
        M,
        f,
        g!,
        p0;
        stepsize=stepsize,
        evaluation=InplaceEvaluation(),
        return_state=true,
        memory_size=mem_len,
        stopping_criterion=manopt_sc,
        debug=[],
        kwargs...,
    )
    bench_manopt = @benchmark quasi_Newton(
        $M,
        $f,
        $g!,
        $p0;
        stepsize=$(stepsize),
        evaluation=$(InplaceEvaluation()),
        memory_size=$mem_len,
        stopping_criterion=$(manopt_sc),
        debug=[],
        $kwargs...,
    )
    iters = get_count(manopt_state, :Iterations)
    final_val = f(M, manopt_state.p)
    return median(bench_manopt.times) / 1000, iters, final_val
end

"""
    lbfgs_study(; pruning_coeff::Float64=0.95)

Set up the example hyperparameter optimization study.
"""
function lbfgs_study(; pruning_coeff::Float64=0.95)
    Ns = [2^n for n in 1:3:12]
    ls_hz = LineSearches.HagerZhang()
    od = ObjectiveData(
        f_rosenbrock,
        g_rosenbrock!,
        Ns,
        1e-5,
        AbstractVectorTransportMethod[ParallelTransport(), ProjectionTransport()],
        [ExponentialRetraction(), ProjectionRetraction()],
        Tuple{String,Any}[("Sphere", N -> Manifolds.Sphere(N - 1))],
        zeros(Float64, eachindex(Ns)),
        Tuple{String,Any}[
            ("LS-HZ", M -> Manopt.LineSearchesStepsize(ls_hz)),
            #("Improved HZ", (M, sigma) -> HagerZhangLinesearch(M; sigma=sigma)),
            ("Wolfe-Powell", (M, sufficient_decrease, sufficient_curvature) -> Manopt.WolfePowellLinesearch(; sufficient_decrease = sufficient_decrease, sufficient_curvature = sufficient_curvature)),
        ],
        10.0,
    )

    # Here you need to define baseline values of all hyperparameters
    baseline_pruning_losses = compute_pruning_losses(
        od,
        Dict("mem_len" => 4),
        Dict(
            "Wolfe-Powell sufficient_decrease" => 1e-4,
            "Wolfe-Powell 1-sufficient_curvature" => 1e-3,
            "Improved HZ sigma" => 0.9,
        ),
        Dict(
            "vector_transport_method" => 1,
            "retraction_method" => 1,
            "manifold" => 1,
            "manopt_stepsize" => 1,
        ),
    )
    od.pruning_losses = pruning_coeff * baseline_pruning_losses

    study = optuna.create_study(; study_name="L-BFGS")
    # Here you can specify number of trials and timeout (in seconds).
    study.optimize(od; n_trials=1000, timeout=500)
    println("Best params is $(study.best_params) with value $(study.best_value)")
    selected_manifold = od.manifold_constructors[pyconvert(Int, study.best_params["manifold"])][1]
    selected_retraction_method = od.retrs[pyconvert(Int, study.best_params["retraction_method"])]
    selected_vector_transport = od.vts[pyconvert(Int, study.best_params["vector_transport_method"])]
    println("Selected manifold: $(selected_manifold)")
    println("Selected retraction method: $(selected_retraction_method)")
    println("Selected vector transport method: $(selected_vector_transport)")
    return study
end

lbfgs_study()

[I 2025-11-06 15:01:01,300] A new study created in memory with name: L-BFGS
[I 2025-11-06 15:01:39,567] Trial 0 finished with value: 6113.890084581559 and parameters: {'mem_len': 7, 'vector_transport_method': 1, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 0 with value: 6113.890084581559.
[I 2025-11-06 15:02:20,221] Trial 1 finished with value: 154814.93166376086 and parameters: {'mem_len': 22, 'vector_transport_method': 2, 'retraction_method': 1, 'manifold': 1, 'manopt_stepsize': 2, 'Wolfe-Powell sufficient_decrease': 2.2329267844390062e-05, 'Wolfe-Powell 1-sufficient_curvature': 0.0055127501906131774}. Best is trial 0 with value: 6113.890084581559.
[I 2025-11-06 15:02:57,165] Trial 2 finished with value: 6124.912084581583 and parameters: {'mem_len': 15, 'vector_transport_method': 2, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 0 with value: 6113.890084581559.
[I 2025-11-06 15:03:31,680] Trial 3 finished with value: 6090.120584581558 and parameters: {'mem_len': 7, 'vector_transport_method': 1, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 3 with value: 6090.120584581558.
[I 2025-11-06 15:04:06,175] Trial 4 finished with value: 6109.476084581554 and parameters: {'mem_len': 14, 'vector_transport_method': 1, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 3 with value: 6090.120584581558.
[I 2025-11-06 15:04:40,786] Trial 5 finished with value: 6108.611584581555 and parameters: {'mem_len': 26, 'vector_transport_method': 1, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 3 with value: 6090.120584581558.
[I 2025-11-06 15:04:48,235] Trial 6 pruned. 
[I 2025-11-06 15:05:22,534] Trial 7 finished with value: 6057.355084581624 and parameters: {'mem_len': 4, 'vector_transport_method': 2, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1}. Best is trial 7 with value: 6057.355084581624.
[I 2025-11-06 15:05:29,959] Trial 8 pruned. 
[I 2025-11-06 15:05:53,521] Trial 9 pruned. 
[I 2025-11-06 15:06:28,408] Trial 10 pruned. 
[I 2025-11-06 15:06:54,861] Trial 11 pruned. 
[I 2025-11-06 15:07:02,401] Trial 12 pruned. 
[I 2025-11-06 15:07:10,090] Trial 13 pruned. 
[I 2025-11-06 15:07:26,574] Trial 14 pruned. 
[I 2025-11-06 15:07:33,994] Trial 15 pruned. 
[I 2025-11-06 15:07:41,526] Trial 16 pruned. 
[I 2025-11-06 15:07:49,059] Trial 17 pruned. 
[I 2025-11-06 15:08:15,446] Trial 18 pruned. 
[I 2025-11-06 15:08:22,923] Trial 19 pruned. 
[I 2025-11-06 15:08:30,406] Trial 20 pruned. 
[I 2025-11-06 15:08:37,893] Trial 21 pruned. 
[I 2025-11-06 15:09:12,437] Trial 22 pruned. 
[I 2025-11-06 15:09:19,869] Trial 23 pruned. 
[I 2025-11-06 15:09:35,027] Trial 24 pruned. 
Best params is {'mem_len': 4, 'vector_transport_method': 2, 'retraction_method': 2, 'manifold': 1, 'manopt_stepsize': 1} with value 6057.355084581624
Selected manifold: Sphere
Selected retraction method: ManifoldsBase.ProjectionRetraction()
Selected vector transport method: ManifoldsBase.ProjectionTransport()

Python: <optuna.study.study.Study object at 0x7690e17f0d70>

Summary

We’ve shown how to automatically select the best hyperparameter values for your optimization problem.

Technical details

This tutorial is cached. It was last run on the following package versions.

Status `~/.julia/dev/ManoptExamples/examples/Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.3
  [336ed68f] CSV v0.10.15
  [13f3f980] CairoMakie v0.15.6
  [0ca39b1e] Chairmarks v1.3.1
  [35d6a980] ColorSchemes v3.31.0
  [5ae59095] Colors v0.13.1
  [a93c6f00] DataFrames v1.8.1
  [31c24e10] Distributions v0.25.122
  [682c06a0] JSON v1.2.0
  [8ac3fa9e] LRUCache v1.6.2
  [b964fa9f] LaTeXStrings v1.4.0
  [d3d80556] LineSearches v7.4.0
  [ee78f7c6] Makie v0.24.6
  [af67fdf4] ManifoldDiff v0.4.5
  [1cead3c2] Manifolds v0.11.3
  [3362f125] ManifoldsBase v2.2.0
  [0fc0a36d] Manopt v0.5.26
  [5b8d5e80] ManoptExamples v0.1.17 `..`
  [51fcb6bd] NamedColors v0.2.3
  [91a5bcdd] Plots v1.41.1
  [08abe8d2] PrettyTables v3.1.0
  [6099a3de] PythonCall v0.9.28
  [f468eda6] QuadraticModels v0.9.14
  [1e40b3f8] RipQP v0.7.0

This tutorial was last rendered November 6, 2025, 15:9:36.

Literature

[ASY+19]: T. Akiba, S. Sano, T. Yanase, T. Ohta and M. Koyama. Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019), arXiv:1907.10902.