The Riemannian trust regions solver
Minimize a function
\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]
by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as
\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]
where
\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]
Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.
Interface
Manopt.trust_regions
— Functiontrust_regions(M, f, grad_f, hess_f, p=rand(M))
trust_regions(M, f, grad_f, p=rand(M))
run the Riemannian trust-regions solver for optimization on manifolds to minimize f
, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference
. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent
is used.
Input
M
: a manifold $\mathcal M$f
: a cost function $f : \mathcal M → ℝ$ to minimizegrad_f
: the gradient $\operatorname{grad}F : \mathcal M → T \mathcal M$ of $F$Hess_f
: (optional), the Hessian $\operatorname{Hess}F(x): T_x\mathcal M → T_x\mathcal M$, $X ↦ \operatorname{Hess}F(x)[X] = ∇_ξ\operatorname{grad}f(x)$p
: (rand(M)
) an initial value $x ∈ \mathcal M$
Keyword arguments
acceptance_rate
: Accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $\frac{1}{4}$augmentation_threshold
: (0.75
) trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)augmentation_factor
: (2.0
) trust-region augmentation factorevaluation
: (AllocatingEvaluation
) specify whether the gradient and Hessian work by allocation (default) orInplaceEvaluation
in placeκ
: (0.1
) the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereinmax_trust_region_radius
: the maximum trust-region radiuspreconditioner
: a preconditioner (a symmetric, positive definite operator that should approximate the inverse of the Hessian)project!
; (copyto!
) specify a projection operation for tangent vectors within the subsolver for numerical stability. The required form is(M, Y, p, X) -> ...
working in place ofY
.randomize
; set to true if the trust-region solve is to be initiated with a random tangent vector and no preconditioner is used.ρ_regularization
: (1e3
) regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor
: (0.25
) trust-region reduction factorreduction_threshold
: (0.1
) trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor
.retraction
(default_retraction_method(M, typeof(p))
) a retraction to usestopping_criterion
: (StopAfterIteration
(1000) |
StopWhenGradientNormLess
(1e-6)
) a functor inheriting fromStoppingCriterion
indicating when to stop.sub_kwargs
: keyword arguments passed to the sub state and used to decorate the sub optionssub_stopping_criterion
: a stopping criterion for the sub solver, uses the same standard as TCG.sub_problem
: (DefaultManoptProblem
(M,
ConstrainedManifoldObjective
(subcost, subgrad; evaluation=evaluation))
) problem for the subsolversub_state
: (QuasiNewtonState
) usingQuasiNewtonLimitedMemoryDirectionUpdate
withInverseBFGS
andsub_stopping_criterion
as a stopping criterion. See alsosub_kwargs
.θ
: (1.0
) 1+θ is the superlinear convergence target rate of the tCG-methodtruncated_conjugate_gradient_descent
, and is used in a stopping criterion thereintrust_region_radius
: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
See also
Manopt.trust_regions!
— Functiontrust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)
evaluate the Riemannian trust-regions solver in place of p
.
Input
M
: a manifold $\mathcal M$f
: a cost function $f: \mathcal M → ℝ$ to minimizegrad_f
: the gradient $\operatorname{grad}f: \mathcal M → T \mathcal M$ of $F$Hess_f
: (optional) the Hessian $\operatorname{Hess} f$p
: an initial value $p ∈ \mathcal M$
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
for more details and all options, see trust_regions
State
Manopt.TrustRegionsState
— TypeTrustRegionsState <: AbstractHessianSolverState
Store the state of the trust-regions solver.
Fields
All the following fields (besides p
) can be set by specifying them as keywords.
acceptance_rate
: (0.1
) a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.max_trust_region_radius
: (sqrt(manifold_dimension(M))
) the maximum trust-region radiusp
: (rand(M)
if a manifold is provided) the current iterateproject!
: (copyto!
) specify a projection operation for tangent vectors for numerical stability. A function(M, Y, p, X) -> ...
working in place ofY
. per default, no projection is performed, set it toproject!
to activate projection.stop
: (StopAfterIteration
(1000) |
StopWhenGradientNormLess
(1e-6)
)randomize
: (false
) indicates if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner is used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated.ρ_regularization
: (10000.0
) regularize the model fitness $ρ$ to avoid division by zerosub_problem
: anAbstractManoptProblem
problem or a function(M, p, X) -> q
or(M, q, p, X)
for the a closed form solution of the sub problemsub_state
: (TruncatedConjugateGradientState
(M, p, X)
)σ
: (0.0
or1e-6
depending onrandomize
) Gaussian standard deviation when creating the random initial tangent vectortrust_region_radius
: (max_trust_region_radius / 8
) the (initial) trust-region radiusX
: (zero_vector(M,p)
) the current gradientgrad_f(p)
Use this default to specify the type of tangent vector to allocate also for the internal (tangent vector) fields.
Internal fields
HX
,HY
,HZ
: interim storage (to avoid allocation) of `\operatorname{Hess} f(p)[\cdot]
ofX
,Y
,Z
Y
: the solution (tangent vector) of the subsolverZ
: the Cauchy point (only used if random is activated)
Constructors
All the following constructors have the fields as keyword arguments with the defaults given in brackets. If no initial point p
is provided, p=rand(M)
is used
TrustRegionsState(M, mho; kwargs...)
TrustRegionsState(M, p, mho; kwargs...)
A trust region state, where the sub problem is set to a DefaultManoptProblem
on the tangent space using the TrustRegionModelObjective
to be solved with truncated_conjugate_gradient_descent!
or in other words the sub state is set to TruncatedConjugateGradientState
.
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, p, sub_problem, sub_state; kwargs...)
A trust region state, where the sub problem is solved using a AbstractManoptProblem
sub_problem
and an AbstractManoptSolverState
sub_state
.
TrustRegionsState(M, f::Function; evaluation=AllocatingEvaluation, kwargs...)
TrustRegionsState(M, p, f; evaluation=AllocatingEvaluation, kwargs...)
A trust region state, where the sub problem is solved in closed form by a function f(M, p, Y, Δ)
, where p
is the current iterate, Y
the initial tangent vector at p
and Δ
the current trust region radius.
See also
Approximation of the Hessian
Several different methods to approximate the Hessian are available.
Manopt.ApproxHessianFiniteDifference
— TypeApproxHessianFiniteDifference{E, P, T, G, RTR,, VTR, R <: Real} <: AbstractApproxHessian
A functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p
and a direction X
and the gradient $\operatorname{grad}F: \mathcal M → T\mathcal M$ of a function $F$ the Hessian is approximated as follows: let $c$ be a stepsize, $X∈ T_p\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{\cdot\gets\cdot}$ is a vector transport.
\[\operatorname{Hess}F(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}F(q)\bigl) - \operatorname{grad}F(p) \Bigl)\]
Fields
gradient!!
: the gradient function (either allocating or mutating, seeevaluation
parameter)step_length
: a step length for the finite differenceretraction_method
: a retraction to usevector_transport_method
: a vector transport to use
Internal temporary fields
grad_tmp
: a temporary storage for the gradient at the currentp
grad_dir_tmp
: a temporary storage for the gradient at the currentp_dir
p_dir::P
: a temporary storage to the forward direction (or the $q$ in the formula)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)
Keyword arguments
evaluation
: (AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).steplength
: ($2^{-14}$) step length $c$ to approximate the gradient evaluationsretraction_method
: (default_retraction_method(M, typeof(p))
) aretraction(M, p, X)
to use in the approximation.vector_transport_method
: (default_vector_transport_method(M, typeof(p))
) a vector transport to use
Manopt.ApproxHessianSymmetricRankOne
— TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the symmetric rank one update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).ν
a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
Manopt.ApproxHessianBFGS
— TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessian
A functor to approximate the Hessian by the BFGS update.
Fields
gradient!!
the gradient function (either allocating or mutating, seeevaluation
parameter).scale
vector_transport_method
a vector transport to use.
Internal temporary fields
p_tmp
a temporary storage the current pointp
.grad_tmp
a temporary storage for the gradient at the currentp
.matrix
a temporary storage for the matrix representation of the approximating operator.basis
a temporary storage for an orthonormal basis at the currentp
.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)
Keyword arguments
initial_operator
(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))
) the matrix representation of the initial approximating operator.basis
(DefaultOrthonormalBasis()
) an orthonormal basis in the tangent space of the initial iterate p.nu
(-1
)evaluation
(AllocatingEvaluation
) whether the gradient is given as an allocation function or an in-place (InplaceEvaluation
).vector_transport_method
(ParallelTransport()
) vector transport $\mathcal T_{\cdot\gets\cdot}$ to use.
as well as their (non-exported) common supertype
Manopt.AbstractApproxHessian
— TypeAbstractApproxHessian <: Function
An abstract supertypes for approximate Hessian functions, declares them also to be functions.
Technical details
The trust_regions
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - By default the stopping criterion uses the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - if you do not provide an initial
max_trust_region_radius
, amanifold_dimension
is required. - A `copyto!
(M, q, p)
andcopy
(M,p)
for points. - By default the tangent vectors are initialized calling
zero_vector
(M,p)
.
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).