The Riemannian trust regions solver
Minimize a function
\[\operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p)\]
by using the Riemannian trust-regions solver following [ABG06] a model is build by lifting the objective at the $k$th iterate $p_k$ by locally mapping the cost function $f$ to the tangent space as $f_k: T_{p_k}\mathcal M → ℝ$ as $f_k(X) = f(\operatorname{retr}_{p_k}(X))$. The trust region subproblem is then defined as
\[\operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X),\]
where
\[\begin{align*} m_k&: T_{p_K}\mathcal M → ℝ,\\ m_k(X) &= f(p_k) + ⟨\operatorname{grad} f(p_k), X⟩_{p_k} + \frac{1}{2}\langle \mathcal H_k(X),X⟩_{p_k}\\ \text{such that}&\ \lVert X \rVert_{p_k} ≤ Δ_k. \end{align*}\]
Here $Δ_k$ is a trust region radius, that is adapted every iteration, and $\mathcal H_k$ is some symmetric linear operator that approximates the Hessian $\operatorname{Hess} f$ of $f$.
Interface
Manopt.trust_regions — Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- f: a cost function $f: \mathcal M→ ℝ$ implemented as- (M, p) -> v
- grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function- (M, p) -> Xor a function- (M, X, p) -> Xcomputing- Xin-place
- Hess_f: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function- (M, p, X) -> Yor a function- (M, Y, p, X) -> Ycomputing- Yin-place
- p: a point on the manifold $\mathcal M$
Keyword arguments
- acceptance_rate: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$
- augmentation_threshold=0.75: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)
- augmentation_factor=2.0: trust-region augmentation factor
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- κ=0.1: the linear convergence target rate of the tCG method- truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
- max_trust_region_radius: the maximum trust-region radius
- preconditioner: a preconditioner for the Hessian H. This is either an allocating function- (M, p, X) -> Yor an in-place function- (M, Y, p, X) -> Y, see- evaluation, and by default set to the identity.
- project!=copyto!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace of- Y, that is- (M, Y, p, X) -> Y, where- Xand- Ycan be the same memory.
- randomize=false: indicate whether- Xis initialised to a random vector or not. This disables preconditioning.
- ρ_regularization=1e3: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.
- reduction_factor=0.25: trust-region reduction factor
- reduction_threshold=0.1: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by- reduction_factor.
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- stopping_criterion=- StopAfterIteration- (1000)- |- StopWhenGradientNormLess- (1e-6): a functor indicating that the stopping criterion is fulfilled
- sub_kwargs=- (;): a named tuple of keyword arguments that are passed to- decorate_objective!of the sub solvers objective, the- decorate_state!of the subsovlers state, and the sub state constructor itself.
- sub_stopping_criterion=( see- truncated_conjugate_gradient_descent): a functor indicating that the stopping criterion is fulfilled
- sub_problem=- DefaultManoptProblem- (M,- ConstrainedManifoldObjective- (subcost, subgrad; evaluation=evaluation)): specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
- sub_state=- QuasiNewtonState: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. where- QuasiNewtonLimitedMemoryDirectionUpdatewith- InverseBFGSis used
- θ=1.0: the superlinear convergence target rate of $1+θ$ of the tCG-method- truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
- trust_region_radius=- injectivity_radius- (M) / 4: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
See also
Manopt.trust_regions! — Functiontrust_regions(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
trust_regions(M, f, grad_f, p=rand(M); kwargs...)
trust_regions!(M, f, grad_f, Hess_f, p; kwargs...)
trust_regions!(M, f, grad_f, p; kwargs...)run the Riemannian trust-regions solver for optimization on manifolds to minimize f, see on [ABG06, CGT00].
For the case that no Hessian is provided, the Hessian is computed using finite differences, see ApproxHessianFiniteDifference. For solving the inner trust-region subproblem of finding an update-vector, by default the truncated_conjugate_gradient_descent is used.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- f: a cost function $f: \mathcal M→ ℝ$ implemented as- (M, p) -> v
- grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function- (M, p) -> Xor a function- (M, X, p) -> Xcomputing- Xin-place
- Hess_f: the (Riemannian) Hessian $\operatorname{Hess}f: T_{p}\mathcal M → T_{p}\mathcal M$ of f as a function- (M, p, X) -> Yor a function- (M, Y, p, X) -> Ycomputing- Yin-place
- p: a point on the manifold $\mathcal M$
Keyword arguments
- acceptance_rate: accept/reject threshold: if ρ (the performance ratio for the iterate) is at least the acceptance rate ρ', the candidate is accepted. This value should be between $0$ and $rac{1}{4}$
- augmentation_threshold=0.75: trust-region augmentation threshold: if ρ is larger than this threshold, a solution is on the trust region boundary and negative curvature, and the radius is extended (augmented)
- augmentation_factor=2.0: trust-region augmentation factor
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- κ=0.1: the linear convergence target rate of the tCG method- truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
- max_trust_region_radius: the maximum trust-region radius
- preconditioner: a preconditioner for the Hessian H. This is either an allocating function- (M, p, X) -> Yor an in-place function- (M, Y, p, X) -> Y, see- evaluation, and by default set to the identity.
- project!=copyto!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace of- Y, that is- (M, Y, p, X) -> Y, where- Xand- Ycan be the same memory.
- randomize=false: indicate whether- Xis initialised to a random vector or not. This disables preconditioning.
- ρ_regularization=1e3: regularize the performance evaluation $ρ$ to avoid numerical inaccuracies.
- reduction_factor=0.25: trust-region reduction factor
- reduction_threshold=0.1: trust-region reduction threshold: if ρ is below this threshold, the trust region radius is reduced by- reduction_factor.
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- stopping_criterion=- StopAfterIteration- (1000)- |- StopWhenGradientNormLess- (1e-6): a functor indicating that the stopping criterion is fulfilled
- sub_kwargs=- (;): a named tuple of keyword arguments that are passed to- decorate_objective!of the sub solvers objective, the- decorate_state!of the subsovlers state, and the sub state constructor itself.
- sub_stopping_criterion=( see- truncated_conjugate_gradient_descent): a functor indicating that the stopping criterion is fulfilled
- sub_problem=- DefaultManoptProblem- (M,- ConstrainedManifoldObjective- (subcost, subgrad; evaluation=evaluation)): specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
- sub_state=- QuasiNewtonState: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function. where- QuasiNewtonLimitedMemoryDirectionUpdatewith- InverseBFGSis used
- θ=1.0: the superlinear convergence target rate of $1+θ$ of the tCG-method- truncated_conjugate_gradient_descent, and is used in a stopping criterion therein
- trust_region_radius=- injectivity_radius- (M) / 4: the initial trust-region radius
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference.
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
See also
State
Manopt.TrustRegionsState — TypeTrustRegionsState <: AbstractHessianSolverStateStore the state of the trust-regions solver.
Fields
- acceptance_rate: a lower bound of the performance ratio for the iterate that decides if the iteration is accepted or not.
- HX,- HY,- HZ: interim storage (to avoid allocation) of `- \operatorname{Hess} f(p)[⋅]of- X,- Y,- Z
- max_trust_region_radius: the maximum trust-region radius
- p::P: a point on the manifold $\mathcal M$ storing the current iterate
- project!: for numerical stability it is possible to project onto the tangent space after every iteration. the function has to work inplace of- Y, that is- (M, Y, p, X) -> Y, where- Xand- Ycan be the same memory.
- stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilled
- randomize: indicate whether- Xis initialised to a random vector or not
- ρ_regularization: regularize the model fitness $ρ$ to avoid division by zero
- sub_problem::Union{AbstractManoptProblem, F}: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
- sub_state::Union{AbstractManoptProblem, F}: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.
- σ: Gaussian standard deviation when creating the random initial tangent vector This field has no effect, when- randomizeis false.
- trust_region_radius: the trust-region radius
- X::T: a tangent vector at the point $p$ on the manifold $\mathcal M$
- Y: the solution (tangent vector) of the subsolver
- Z: the Cauchy point (only used if random is activated)
Constructors
TrustRegionsState(M, mho::AbstractManifoldHessianObjective; kwargs...)
TrustRegionsState(M, sub_problem, sub_state; kwargs...)
TrustRegionsState(M, sub_problem; evaluation=AllocatingEvaluation(), kwargs...)create a trust region state.
- given a AbstractManifoldHessianObjectivemho, the default sub solver, aTruncatedConjugateGradientStatewithmhoused to define the problem on a tangent space is created
- given a sub_problemand anevaluation=keyword, the sub problem solver is assumed to be the closed form solution, whereevaluationdetermines how to call the sub function.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- sub_problem: specify a problem for a solver or a closed form solution function, which can be allocating or in-place.
- sub_state: a state to specify the sub solver to use. For a closed form solution, this indicates the type of function.
Keyword arguments
- acceptance_rate=0.1
- max_trust_region_radius=sqrt(manifold_dimension(M))
- p=- rand- (M): a point on the manifold $\mathcal M$ to specify the initial value
- project!=copyto!
- stopping_criterion=- StopAfterIteration- (1000)- |- StopWhenGradientNormLess- (1e-6): a functor indicating that the stopping criterion is fulfilled
- randomize=false
- ρ_regularization=10000.0
- θ=1.0
- trust_region_radius=max_trust_region_radius / 8
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector
See also
Approximation of the Hessian
Several different methods to approximate the Hessian are available.
Manopt.ApproxHessianFiniteDifference — TypeApproxHessianFiniteDifference{E, P, T, G, RTR, VTR, R <: Real} <: AbstractApproxHessianA functor to approximate the Hessian by a finite difference of gradient evaluation.
Given a point p and a direction X and the gradient $\operatorname{grad} f(p)$ of a function $f$ the Hessian is approximated as follows: let $c$ be a stepsize, $X ∈ T_{p}\mathcal M$ a tangent vector and $q = \operatorname{retr}_p(\frac{c}{\lVert X \rVert_p}X)$ be a step in direction $X$ of length $c$ following a retraction Then the Hessian is approximated by the finite difference of the gradients, where $\mathcal T_{⋅←⋅}$ is a vector transport.
\[\operatorname{Hess}f(p)[X] ≈ \frac{\lVert X \rVert_p}{c}\Bigl( \mathcal T_{p\gets q}\bigr(\operatorname{grad}f(q)\bigl) - \operatorname{grad}f(p) \Bigl)\]
Fields
- gradient!!: the gradient function (either allocating or mutating, see- evaluationparameter)
- step_length: a step length for the finite difference
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
- grad_tmp: a temporary storage for the gradient at the current- p
- grad_dir_tmp: a temporary storage for the gradient at the current- p_dir
- p_dir::P: a temporary storage to the forward direction (or the $q$ in the formula)
Constructor
ApproximateFiniteDifference(M, p, grad_f; kwargs...)Keyword arguments
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- steplength=2^{-14}$: step length$c`` to approximate the gradient evaluations
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianSymmetricRankOne — TypeApproxHessianSymmetricRankOne{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessianA functor to approximate the Hessian by the symmetric rank one update.
Fields
- gradient!!: the gradient function (either allocating or mutating, see- evaluationparameter).
- ν: a small real number to ensure that the denominator in the update does not become too small and thus the method does not break down.
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports.
Internal temporary fields
- p_tmp: a temporary storage the current point- p.
- grad_tmp: a temporary storage for the gradient at the current- p.
- matrix: a temporary storage for the matrix representation of the approximating operator.
- basis: a temporary storage for an orthonormal basis at the current- p.
Constructor
ApproxHessianSymmetricRankOne(M, p, gradF; kwargs...)Keyword arguments
- initial_operator(- Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
- basis(- DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
- nu(- -1)
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ApproxHessianBFGS — TypeApproxHessianBFGS{E, P, G, T, B<:AbstractBasis{ℝ}, VTR, R<:Real} <: AbstractApproxHessianA functor to approximate the Hessian by the BFGS update.
Fields
- gradient!!the gradient function (either allocating or mutating, see- evaluationparameter).
- scale
- vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Internal temporary fields
- p_tmpa temporary storage the current point- p.
- grad_tmpa temporary storage for the gradient at the current- p.
- matrixa temporary storage for the matrix representation of the approximating operator.
- basisa temporary storage for an orthonormal basis at the current- p.
Constructor
ApproxHessianBFGS(M, p, gradF; kwargs...)Keyword arguments
- initial_operator(- Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))) the matrix representation of the initial approximating operator.
- basis(- DefaultOrthonormalBasis()) an orthonormal basis in the tangent space of the initial iterate p.
- nu(- -1)
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
as well as their (non-exported) common supertype
Manopt.AbstractApproxHessian — TypeAbstractApproxHessian <: FunctionAn abstract supertype for approximate Hessian functions, declares them also to be functions.
Technical details
The trust_regions solver requires the following functions of a manifold to be available
- A retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=does not have to be specified.
- By default the stopping criterion uses the normas well, to stop when the norm of the gradient is small, but if you implementedinner, the norm is provided already.
- if you do not provide an initial max_trust_region_radius, amanifold_dimensionis required.
- A copyto!(M, q, p)andcopy(M,p)for points.
- By default the tangent vectors are initialized calling zero_vector(M,p).
Literature
- [ABG06]
- P.-A. Absil, C. Baker and K. Gallivan. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics 7, 303–330 (2006).
- [CGT00]
- A. R. Conn, N. I. Gould and P. L. Toint. Trust Region Methods (Society for Industrial and Applied Mathematics, 2000).