Adaptive regularization with cubics
Manopt.adaptive_regularization_with_cubics
— Functionadaptive_regularization_with_cubics(M, f, grad_f, Hess_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, f, grad_f, p=rand(M); kwargs...)
adaptive_regularization_with_cubics(M, mho, p=rand(M); kwargs...)
Solve an optimization problem on the manifold M
by iteratively minimizing
\[ m_k(X) = f(p_k) + ⟨X, \operatorname{grad} f(p_k)⟩ + \frac{1}{2}⟨X, \operatorname{Hess} f(p_k)[X]⟩ + \frac{σ_k}{3}\lVert X \rVert^3\]
on the tangent space at the current iterate $p_k$, where $X ∈ T_{p_k}\mathcal M$ and $σ_k > 0$ is a regularization parameter.
Let $X_k$ denote the minimizer of the model $m_k$ and use the model improvement
\[ ρ_k = \frac{f(p_k) - f(\operatorname{retr}_{p_k}(X_k))}{m_k(0) - m_k(X_k) + \frac{σ_k}{3}\lVert X_k\rVert^3}.\]
With two thresholds $η_2 ≥ η_1 > 0$ set $p_{k+1} = \operatorname{retr}_{p_k}(X_k)$ if $ρ ≥ η_1$ and reject the candidate otherwise, that is, set $p_{k+1} = p_k$.
Further update the regularization parameter using factors $0 < γ_1 < 1 < γ_2$
\[σ_{k+1} = \begin{cases} \max\{σ_{\min}, γ_1σ_k\} & \text{ if } ρ \geq η_2 &\text{ (the model was very successful)},\\ σ_k & \text{ if } ρ ∈ [η_1, η_2)&\text{ (the model was successful)},\\ γ_2σ_k & \text{ if } ρ < η_1&\text{ (the model was unsuccessful)}. \end{cases}\]
For more details see [ABBC20].
Input
M
: a manifold $\mathcal M$f
: a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$p
: an initial value $p ∈ \mathcal M$
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and Hessian might also be provided as a ManifoldHessianObjective
Keyword arguments
the default values are given in brackets
σ
: (100.0 / sqrt(manifold_dimension(M)
) initial regularization parameterσmin
: (1e-10
) minimal regularization value $σ_{\min}$η1
: (0.1
) lower model success thresholdη2
: (0.9
) upper model success thresholdγ1
: (0.1
) regularization reduction factor (for the success case)γ2
: (2.0
) regularization increment factor (for the non-success case)evaluation
: (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
in place, that is of the formgrad_f!(M, X, p)
and analogously for the Hessian.retraction_method
: (default_retraction_method(M, typeof(p))
) a retraction to useinitial_tangent_vector
: (zero_vector(M, p)
) initialize any tangent vector data,maxIterLanczos
: (200
) a shortcut to set the stopping criterion in the sub solver,ρ_regularization
: (1e3
) a regularization to avoid dividing by zero for small values of cost and modelstopping_criterion
: (StopAfterIteration
(40) |
StopWhenGradientNormLess
(1e-9) |
StopWhenAllLanczosVectorsUsed
(maxIterLanczos)
)sub_state
:LanczosState
(M, copy(M, p); maxIterLanczos=maxIterLanczos, σ=σ) a state for the subproblem or an [
AbstractEvaluationType`](@ref) if the problem is a function.sub_objective
: a shortcut to modify the objective of the subproblem used within in thesub_problem
:DefaultManoptProblem
(M, sub_objective)
the problem (or a function) for the sub problem
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective, respectively. If you provide the ManifoldGradientObjective
directly, these decorations can still be specified
By default the debug=
keyword is set to DebugIfEntry
(:ρ_denominator, >(0); message="Denominator nonpositive", type=:error)
to avoid that by rounding errors the denominator in the computation of ρ
gets nonpositive.
Manopt.adaptive_regularization_with_cubics!
— Functionadaptive_regularization_with_cubics!(M, f, grad_f, Hess_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, f, grad_f, p; kwargs...)
adaptive_regularization_with_cubics!(M, mho, p; kwargs...)
evaluate the Riemannian adaptive regularization with cubics solver in place of p
.
Input
M
: a manifold $\mathcal M$f
: a cost function $F: \mathcal M → ℝ$ to minimizegrad_f
: the gradient $\operatorname{grad}F: \mathcal M → T \mathcal M$ of $F$Hess_f
: (optional) the Hessian $H( \mathcal M, x, ξ)$ of $F$p
: an initial value $p ∈ \mathcal M$
For the case that no Hessian is provided, the Hessian is computed using finite difference, see ApproxHessianFiniteDifference
.
the cost f
and its gradient and Hessian might also be provided as a ManifoldHessianObjective
for more details and all options, see adaptive_regularization_with_cubics
.
State
Manopt.AdaptiveRegularizationState
— TypeAdaptiveRegularizationState{P,T} <: AbstractHessianSolverState
A state for the adaptive_regularization_with_cubics
solver.
Fields
a default value is given in brackets if a parameter can be left out in initialization.
η1
,η2
: (0.1
,0.9
) bounds for evaluating the regularization parameterγ1
,γ2
: (0.1
,2.0
) shrinking and expansion factors for regularization parameterσ
p
: (rand(M)
the current iterateX
: (zero_vector(M,p)
) the current gradient $\operatorname{grad}f(p)$s
: (zero_vector(M,p)
) the tangent vector step resulting from minimizing the model problem in the tangent space $\mathcal T_{p} \mathcal M$σ
: the current cubic regularization parameterσmin
: (1e-7
) lower bound for the cubic regularization parameterρ_regularization
: (1e3
) regularization parameter for computing ρ.
When approaching convergence ρ may be difficult to compute with numerator and denominator approaching zero. Regularizing the ratio lets ρ go to 1 near convergence.
evaluation
: (AllocatingEvaluation()
) if you provide aretraction_method
: (default_retraction_method(M)
) the retraction to usestopping_criterion
: (StopAfterIteration
(100)
) aStoppingCriterion
sub_problem
: sub problem solved in each iterationsub_state
: sub state for solving the sub problem, either a solver state if the problem is anAbstractManoptProblem
or anAbstractEvaluationType
if it is a function, where it defaults toAllocatingEvaluation
.
Furthermore the following integral fields are defined
q
: (copy(M,p)
) a point for the candidates to evaluate model and ρH
: (copy(M, p, X)
) the current Hessian, $\operatorname{Hess}F(p)[⋅]$S
: (copy(M, p, X)
) the current solution from the subsolverρ
: the current regularized ratio of actual improvement and model improvement.ρ_denominator
: (one(ρ)
) a value to store the denominator from the computation of ρ to allow for a warning or error when this value is non-positive.
Constructor
AdaptiveRegularizationState(M, p=rand(M); X=zero_vector(M, p); kwargs...)
Construct the solver state with all fields stated as keyword arguments.
Sub solvers
There are several ways to approach the subsolver. The default is the first one.
Lanczos iteration
Manopt.LanczosState
— TypeLanczosState{P,T,SC,B,I,R,TM,V,Y} <: AbstractManoptSolverState
Solve the adaptive regularized subproblem with a Lanczos iteration
Fields
stop
: the stopping criterionσ
: the current regularization parameterX
: the IterateLanczos_vectors
: the obtained Lanczos vectorstridig_matrix
: the tridiagonal coefficient matrix Tcoefficients
: the coefficients $y_1,...y_k$ that determine the solutionHp
: a temporary vector containing the evaluation of the HessianHp_residual
: a temporary vector containing the residual to the HessianS
: the current obtained / approximated solution
(Conjugate) gradient descent
There is a generic objective, that implements the sub problem
Manopt.AdaptiveRagularizationWithCubicsModelObjective
— TypeAdaptiveRagularizationWithCubicsModelObjective
A model for the adaptive regularization with Cubics
\[m(X) = f(p) + ⟨\operatorname{grad} f(p), X ⟩_p + \frac{1}{2} ⟨\operatorname{Hess} f(p)[X], X⟩_p + \frac{σ}{3} \lVert X \rVert^3,\]
cf. Eq. (33) in [ABBC20]
Fields
objective
: anAbstractManifoldHessianObjective
proving $f$, its gradient and Hessianσ
: the current (cubic) regularization parameter
Constructors
AdaptiveRagularizationWithCubicsModelObjective(mho, σ=1.0)
with either an AbstractManifoldHessianObjective
objective
or an decorator containing such an objective.
Since the sub problem is given on the tangent space, you have to provide
arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ)
sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj)
where mho
is the Hessian objective of f
to solve. Then use this for the sub_problem
keyword and use your favourite gradient based solver for the sub_state
keyword, for example a ConjugateGradientDescentState
Additional stopping criteria
Manopt.StopWhenAllLanczosVectorsUsed
— TypeStopWhenAllLanczosVectorsUsed <: StoppingCriterion
When an inner iteration has used up all Lanczos vectors, then this stopping criterion is a fallback / security stopping criterion to not access a non-existing field in the array allocated for vectors.
Note that this stopping criterion (for now) is only implemented for the case that an AdaptiveRegularizationState
when using a LanczosState
subsolver
Fields
maxLanczosVectors
: maximal number of Lanczos vectorsreason
: a String indicating the reason if the criterion indicated to stop
Constructor
StopWhenAllLanczosVectorsUsed(maxLancosVectors::Int)
Manopt.StopWhenFirstOrderProgress
— TypeStopWhenFirstOrderProgress <: StoppingCriterion
A stopping criterion related to the Riemannian adaptive regularization with cubics (ARC) solver indicating that the model function at the current (outer) iterate,
\[ m(X) = f(p) + <X, \operatorname{grad}f(p)> + \frac{1}{2} <X, \operatorname{Hess} f(p)[X]> + \frac{σ}{3} \lVert X \rVert^3,\]
defined on the tangent space $T_{p}\mathcal M$ fulfills at the current iterate $X_k$ that
\[m(X_k) \leq m(0) \quad\text{ and }\quad \lVert \operatorname{grad} m(X_k) \rVert ≤ θ \lVert X_k \rVert^2\]
Fields
θ
: the factor $θ$ in the second conditionreason
: a String indicating the reason if the criterion indicated to stop
Technical details
The adaptive_regularization_with_cubics
requires the following functions of a manifolds to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - if you do not provide an initial regularization parameter
σ
, amanifold_dimension
is required. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
. inner
(M, p, X, Y)
is used within the algorithm step
Furthermore, within the Lanczos subsolver, generating a random vector (at p
) using rand!
(M, X; vector_at=p)in place of
X` is required
Literature
- [ABBC20]
- N. Agarwal, N. Boumal, B. Bullins and C. Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming (2020).