The Riemannian trust regions solver
Minimize a function
\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]
by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as
\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]
where
\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]
Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.
Interface
Manopt.trust_regions — Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.
Input
M::AbstractManifold: a Riemannian manifold $\mathcal M$f: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> vgrad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> Xor a function(M, X, p) -> XcomputingXin-placeHess_f: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Yor a function(M, Y, p, X) -> YcomputingYin-placep: a point on the manifold $\mathcal M$
Keyword arguments
acceptance_rate: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$augmentation_threshold=0.75: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)augmentation_factor=2.0: trust-region augmentation factorevaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.κ=0.1: the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent, and is used in a stopping criterion thereinmax_trust_region_radius: the maximum trust-region radiuspreconditioner: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Yor an in-place function(M, Y, p, X) -> Y, seeevaluation, and by default set to the identity.project!=copyto!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY, that is(M, Y, p, X) -> Y, whereXandYcan be the same memory.randomize=false: indicate whetherXis initialised to a random vector or not. This disables preconditioning.ρ_regularization=1e3: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor=0.25: trust-region reduction factorreduction_threshold=0.1: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor.retraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=StopAfterIteration(1000)|StopWhenGradientNormLess(1e-6): a functor indicating that the stopping criterion is fulfilledsub_kwargs=(;): a named tuple of keyword arguments that are passed todecorate_objective!of the sub solvers objective, thedecorate_state!of the subsovlers state, and the sub state constructor itself.sub_stopping_criterion=( seetruncated_conjugate_gradient_descent): a functor indicating that the stopping criterion is fulfilledsub_problem=DefaultManoptProblem(M,ConstrainedManifoldObjective(subcost, subgrad; evaluation=evaluation)): specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state=QuasiNewtonState: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. whereQuasiNewtonLimitedMemoryDirectionUpdatewithInverseBFGSis usedθ=1.0: the superlinear convergence target rate of $1+θ$ of the tCG-methodtruncated_conjugate_gradient_descent, and is used in a stopping criterion thereintrust_region_radius=injectivity_radius(M) / 4: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
See also
Manopt.trust_regions! — Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.
Input
M::AbstractManifold: a Riemannian manifold $\mathcal M$f: a cost function $f: \mathcal M→ ℝ$ implemented as(M, p) -> vgrad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function(M, p) -> Xor a function(M, X, p) -> XcomputingXin-placeHess_f: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function(M, p, X) -> Yor a function(M, Y, p, X) -> YcomputingYin-placep: a point on the manifold $\mathcal M$
Keyword arguments
acceptance_rate: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$augmentation_threshold=0.75: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)augmentation_factor=2.0: trust-region augmentation factorevaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.κ=0.1: the linear convergence target rate of the tCG methodtruncated_conjugate_gradient_descent, and is used in a stopping criterion thereinmax_trust_region_radius: the maximum trust-region radiuspreconditioner: a preconditioner for the Hessian H. This is either an allocating function(M, p, X) -> Yor an in-place function(M, Y, p, X) -> Y, seeevaluation, and by default set to the identity.project!=copyto!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY, that is(M, Y, p, X) -> Y, whereXandYcan be the same memory.randomize=false: indicate whetherXis initialised to a random vector or not. This disables preconditioning.ρ_regularization=1e3: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.reduction_factor=0.25: trust-region reduction factorreduction_threshold=0.1: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced byreduction_factor.retraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractionsstopping_criterion=StopAfterIteration(1000)|StopWhenGradientNormLess(1e-6): a functor indicating that the stopping criterion is fulfilledsub_kwargs=(;): a named tuple of keyword arguments that are passed todecorate_objective!of the sub solvers objective, thedecorate_state!of the subsovlers state, and the sub state constructor itself.sub_stopping_criterion=( seetruncated_conjugate_gradient_descent): a functor indicating that the stopping criterion is fulfilledsub_problem=DefaultManoptProblem(M,ConstrainedManifoldObjective(subcost, subgrad; evaluation=evaluation)): specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state=QuasiNewtonState: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. whereQuasiNewtonLimitedMemoryDirectionUpdatewithInverseBFGSis usedθ=1.0: the superlinear convergence target rate of $1+θ$ of the tCG-methodtruncated_conjugate_gradient_descent, and is used in a stopping criterion thereintrust_region_radius=injectivity_radius(M) / 4: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
See also
State
Manopt.TrustRegionsState — TypeTrustRegionsState <: AbstractHessianSolverStateStore the state of the trust-regions solver.
Fields
acceptance_rate: a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.HX,HY,HZ: interim storage (to avoid allocation) of `\operatorname{Hess} f(p)[⋅]ofX,Y,Zmax_trust_region_radius: the maximum trust-region radiusp::P: a point on the manifold $\mathcal M$ storing the current iterateproject!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace ofY, that is(M, Y, p, X) -> Y, whereXandYcan be the same memory.stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilledrandomize: indicate whetherXis initialised to a random vector or notρ_regularization: regularize the model fitness $ρ$ to avoid division by zerosub_problem::Union{AbstractManoptProblem, F}: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state::Union{AbstractManoptProblem, F}: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.σ: Gaussian standard deviation when creating the random initial tangent vector This field has no effect, whenrandomizeis false.trust_region_radius: the trust-region radiusX::T: a tangent vector at the point $p$ on the manifold $\mathcal M$Y: the solution (tangent vector) of the subsolverZ: the Cauchy point (only used if random is activated)
Constructors
TrustRegionsState(M, mho::AbstractManifoldHessianObjective; kwargs...)
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, sub_problem; evaluation=AllocatingEvaluation(), kwargs...)create a trust region state.
- given a
AbstractManifoldHessianObjectivemho, the default sub solver, aTruncatedConjugateGradientStatewithmhoused to define the problem on a tangent space is created - given a
sub_problemand anevaluation=keyword, the sub problem solver is assumed to be the closed form solution, whereevaluationdetermines how to call the sub function.
Input
M::AbstractManifold: a Riemannian manifold $\mathcal M$sub_problem: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.sub_state: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.
Keyword arguments
acceptance_rate=0.1max_trust_region_radius=sqrt(manifold_dimension(M))p=rand(M): a point on the manifold $\mathcal M$ to specify the initial valueproject!=copyto!stopping_criterion=StopAfterIteration(1000)|StopWhenGradientNormLess(1e-6): a functor indicating that the stopping criterion is fulfilledrandomize=falseρ_regularization=10000.0θ=1.0trust_region_radius=max_trust_region_radius / 8X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector
See also
Approximation of the Hessian
Several different methods to approximate the Hessian are available.
Manopt.ApproxHessianFiniteDifference — TypeApproxHessianFiniteDifference{E, P, T, G, RTR, VTR, R <: Real} <: AbstractApproxHessianA functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p and a direction X and the gradient $\operatorname{grad} f(p)$ of a function $f$ the Hessian is approximated as follows: let $c$ be a stepsize, $X ∈ T_{p}\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{⋅←⋅}$ is a vector transport.
\[\operatorname{Hess}f(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}f(q)\bigl) - \operatorname{grad}f(p) \Bigl)\]
Fields
gradient!!: the gradient function (either allocating or mutating, seeevaluationparameter)step_length: a step length for the finite differenceretraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractionsvector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
grad_tmp: a temporary storage for the gradient at the currentpgrad_dir_tmp: a temporary storage for the gradient at the currentp_dirp_dir::P: a temporary storage to the forward direction (or the $q$ in the formula)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)Keyword arguments
evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.steplength=2^{-14}$: step length$c`` to approximate the gradient evaluationsretraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractionsvector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianSymmetricRankOne — TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessianA functor to approximate the Hessian by the symmetric rank one update.
Fields
gradient!!: the gradient function (either allocating or mutating, seeevaluationparameter).ν: a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports.
Internal temporary fields
p_tmp: a temporary storage the current pointp.grad_tmp: a temporary storage for the gradient at the currentp.matrix: a temporary storage for the matrix representation of the approximating operator.basis: a temporary storage for an orthonormal basis at the currentp.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)Keyword arguments
initial_operator(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.basis(DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.nu(-1)evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianBFGS — TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessianA functor to approximate the Hessian by the BFGS update.
Fields
gradient!!the gradient function (either allocating or mutating, seeevaluationparameter).scalevector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
p_tmpa temporary storage the current pointp.grad_tmpa temporary storage for the gradient at the currentp.matrixa temporary storage for the matrix representation of the approximating operator.basisa temporary storage for an orthonormal basis at the currentp.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)Keyword arguments
initial_operator(Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.basis(DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.nu(-1)evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
as well as their (non-exported) common supertype
Manopt.AbstractApproxHessian — TypeAbstractApproxHessian <: FunctionAn abstract supertype for approximate Hessian functions, declares them also to be functions.
Technical details
The trust_regions solver requires the following functions of a manifold to be available
- A
retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=does not have to be specified. - By default the stopping criterion uses the
normas well, to stop when the norm of the gradient is small, but if you implementedinner, the norm is provided already. - if you do not provide an initial
max_trust_region_radius, amanifold_dimensionis required. - A
copyto!(M, q, p)andcopy(M,p)for points. - By default the tangent vectors are initialized calling
zero_vector(M,p).
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).