Gradient descent
Manopt.gradient_descent — Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)perform the gradient descent algorithm
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
where $s_k > 0$ denotes a step size.
The algorithm can be performed in-place of p.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- f: a cost function $f: \mathcal M→ ℝ$ implemented as- (M, p) -> v
- grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function- (M, p) -> Xor a function- (M, X, p) -> Xcomputing- Xin-place
- p: a point on the manifold $\mathcal M$
Alternatively to f and grad_f you can provide the corresponding AbstractManifoldFirstOrderObjectivegradient_objective directly.
Keyword arguments
- differential=- nothing: specify a specific function to evaluate the differential. By default, $Df(p)[X] = ⟨\operatorname{grad}f(p),X⟩$. is used
- direction=- IdentityUpdateRule- (): specify to perform a certain processing of the direction, for example- Nesterov,- MomentumGradientor- AverageGradient.
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.For example- grad_f(M,p)allocates, but- grad_f!(M, X, p)computes the result in-place of- X.
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- stepsize=- default_stepsize- (M, GradientDescentState): a functor inheriting from- Stepsizeto determine a step size
- stopping_criterion=- StopAfterIteration- (200)- |- StopWhenGradientNormLess- (1e-8): a functor indicating that the stopping criterion is fulfilled
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
Manopt.gradient_descent! — Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)perform the gradient descent algorithm
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
where $s_k > 0$ denotes a step size.
The algorithm can be performed in-place of p.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- f: a cost function $f: \mathcal M→ ℝ$ implemented as- (M, p) -> v
- grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function- (M, p) -> Xor a function- (M, X, p) -> Xcomputing- Xin-place
- p: a point on the manifold $\mathcal M$
Alternatively to f and grad_f you can provide the corresponding AbstractManifoldFirstOrderObjectivegradient_objective directly.
Keyword arguments
- differential=- nothing: specify a specific function to evaluate the differential. By default, $Df(p)[X] = ⟨\operatorname{grad}f(p),X⟩$. is used
- direction=- IdentityUpdateRule- (): specify to perform a certain processing of the direction, for example- Nesterov,- MomentumGradientor- AverageGradient.
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.For example- grad_f(M,p)allocates, but- grad_f!(M, X, p)computes the result in-place of- X.
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- stepsize=- default_stepsize- (M, GradientDescentState): a functor inheriting from- Stepsizeto determine a step size
- stopping_criterion=- StopAfterIteration- (200)- |- StopWhenGradientNormLess- (1e-8): a functor indicating that the stopping criterion is fulfilled
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.
If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.
If you activate tutorial mode (cf. is_tutorial_mode), this solver provides additional debug warnings.
Output
The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.
State
Manopt.GradientDescentState — TypeGradientDescentState{P,T} <: AbstractGradientSolverStateDescribes the state of a gradient based descent algorithm.
Fields
- p::P: a point on the manifold $\mathcal M$ storing the current iterate
- X::T: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
- stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilled
- stepsize::Stepsize: a functor inheriting from- Stepsizeto determine a step size
- direction::- DirectionUpdateRule: a processor to handle the obtained gradient and compute a direction to “walk into”.
- retraction_method::AbstractRetractionMethod: a retraction $\operatorname{retr}$ to use, see the section on retractions
Constructor
GradientDescentState(M::AbstractManifold; kwargs...)Initialize the gradient descent solver state, where
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
Keyword arguments
- direction=- IdentityUpdateRule- ()
- p=- rand- (M): a point on the manifold $\mathcal M$ to specify the initial value
- stopping_criterion=- StopAfterIteration- (100): a functor indicating that the stopping criterion is fulfilled
- stepsize=- default_stepsize- (M, GradientDescentState; retraction_method=retraction_method): a functor inheriting from- Stepsizeto determine a step size
- retraction_method=- default_retraction_method- (M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector
See also
Direction update rules
A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to
Manopt.AverageGradient — FunctionAverageGradient(; kwargs...)
AverageGradient(M::AbstractManifold; kwargs...)Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$ (optional)
Keyword arguments
- p=- rand- (M): a point on the manifold $\mathcal M$ to specify the initial value
- direction=- IdentityUpdateRulepreprocess the actual gradient before adding momentum
- gradients=[zero_vector(M, p) for _ in 1:n]how to initialise the internal storage
- n=10number of gradient evaluations to take the mean over
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
This function generates a ManifoldDefaultsFactory for AverageGradientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.
Manopt.DirectionUpdateRule — TypeDirectionUpdateRuleA general functor, that handles direction update rules. It's fields are usually only a StoreStateAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule — TypeIdentityUpdateRule <: DirectionUpdateRuleThe default gradient direction update is the identity, usually it just evaluates the gradient.
You can also use Gradient() to create the corresponding factory, though this only delays this parameter-free instantiation to later.
Manopt.MomentumGradient — FunctionMomentumGradient()Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Input
- M(optional)
Keyword arguments
- p=- rand- (M): a point on the manifold $\mathcal M$
- direction=- IdentityUpdateRulepreprocess the actual gradient before adding momentum
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$
- momentum=0.2amount of momentum to use
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
This function generates a ManifoldDefaultsFactory for MomentumGradientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.
Manopt.Nesterov — FunctionNesterov(; kwargs...)
Nesterov(M::AbstractManifold; kwargs...)Assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the GradientDescentState
- a shrinkageparameter $β_k$
- and a current iterate $p_k$
- as well as the interim values $γ_k$ and $v_k$ from the previous iterate.
This compute a Nesterov type update using the following steps, see [ZS18]
- Compute the positive root $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\barγ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{p_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{p_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
- $v_{k+1} = \operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\barγ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\barγ_{k+1}$
Then the direction from $p_k$ to $p_k+1$ by $d = \operatorname{retr}^{-1}_{p_k}p_{k+1}$ is returned.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$ (optional)
Keyword arguments
- p=- rand- (M): a point on the manifold $\mathcal M$ to specify the initial value
- γ=0.001
- μ=0.9
- shrinkage = k -> 0.8
- inverse_retraction_method=- default_inverse_retraction_method- (M, typeof(p)): an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
This function generates a ManifoldDefaultsFactory for NesterovRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.
Manopt.PreconditionedDirection — FunctionPreconditionedDirection(preconditioner; kwargs...)
PreconditionedDirection(M::AbstractManifold, preconditioner; kwargs...)Add a preconditioner to a gradient processor following the motivation for optimization, as a linear invertible map $P: T_{p}\mathcal M → T_{p}\mathcal M$ that usually should be
- symmetric: $⟨X, P(Y)⟩ = ⟨P(X), Y⟩$
- positive definite $⟨X, P(X)⟩ > 0$ for $X$ not the zero-vector
The gradient is then preconditioned as $P(X)$, where $X$ is either the gradient of the objective or the result of a previous (internally stored) gradient processor.
For example if you provide as the preconditioner the inverse of the Hessian $\operatorname{Hess}^{-1} f$, you turn a gradient descent into a Newton method.
Arguments
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$ (optional)
- preconditioner: preconditioner function, either as a- (M, p, X) -> Yallocating or- (M, Y, p, X) -> Ymutating function
Keyword arguments
- direction=- IdentityUpdateRuleinternal- DirectionUpdateRuleto determine the gradients to store or a- ManifoldDefaultsFactorygenerating one
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
This function generates a ManifoldDefaultsFactory for PreconditionedDirectionRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.
which internally use the ManifoldDefaultsFactory and produce the internal elements
Manopt.AverageGradientRule — TypeAverageGradientRule <: DirectionUpdateRuleAdd an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored. The average is taken after vector transporting them to the current iterates tangent space.
Fields
- gradients: the last- ngradient/direction updates
- last_iterate: last iterate (needed to transport the gradients)
- direction: internal- DirectionUpdateRuleto determine directions to apply the averaging to
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Constructors
AverageGradientRule(
    M::AbstractManifold;
    p::P=rand(M);
    n::Int=10
    direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=IdentityUpdateRule(),
    gradients = fill(zero_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = default_vector_transport_method(M, typeof(p))
)Add average to a gradient problem, where
- n: determines the size of averaging
- direction: is the internal- DirectionUpdateRuleto determine the gradients to store
- gradients: can be pre-filled with some history
- last_iterate: stores the last iterate
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
Manopt.ConjugateDescentCoefficientRule — TypeConjugateDescentCoefficientRule <: DirectionUpdateRuleA functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient adapted to manifolds
See also conjugate_gradient_descent
Constructor
ConjugateDescentCoefficientRule()Construct the conjugate descent coefficient update rule, a new storage is created by default.
See also
Manopt.MomentumGradientRule — TypeMomentumGradientRule <: DirectionUpdateRuleStore the necessary information to compute the MomentumGradient direction update.
Fields
- p_old::P: a point on the manifold $\mathcal M$
- momentum::Real: factor for the momentum
- direction: internal- DirectionUpdateRuleto determine directions to add the momentum to.
- vector_transport_method::AbstractVectorTransportMethodP: a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
- X_old::T: a tangent vector at the point $p$ on the manifold $\mathcal M$
Constructors
MomentumGradientRule(M::AbstractManifold; kwargs...)Initialize a momentum gradient rule to s, where p and X are memory for interim values.
Keyword arguments
- p=- rand- (M): a point on the manifold $\mathcal M$
- s=- IdentityUpdateRule- ()
- momentum=0.2
- vector_transport_method=- default_vector_transport_method- (M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports
- X=- zero_vector- (M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$
See also
Manopt.NesterovRule — TypeNesterovRule <: DirectionUpdateRuleCompute a Nesterov inspired direction update rule. See Nesterov for details
Fields
- γ::Real,- μ::Real: coefficients from the last iterate
- v::P: an interim point to compute the next gradient evaluation point- y_k
- shrinkage: a function- k -> ...to compute the shrinkage $β_k$ per iterate- k`.
- inverse_retraction_method=- default_inverse_retraction_method- (M, typeof(p)): an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
Constructor
NesterovRule(M::AbstractManifold; kwargs...)Keyword arguments
- p=- rand- (M): a point on the manifold $\mathcal M$ to specify the initial value
- γ=0.001`
- μ=0.9`
- shrinkage = k -> 0.8
- inverse_retraction_method=- default_inverse_retraction_method- (M, typeof(p)): an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses
See also
Manopt.PreconditionedDirectionRule — TypePreconditionedDirectionRule{E<:AbstractEvaluationType} <: DirectionUpdateRuleAdd a preconditioning as gradient processor, see PreconditionedDirection for more mathematical background.
Fields
- direction: internal- DirectionUpdateRuleto determine directions to apply the preconditioning to
- preconditioner: the preconditioner function
Constructors
PreconditionedDirectionRule(
    M::AbstractManifold,
    preconditioner;
    direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=IdentityUpdateRule(),
    evaluation::AbstractEvaluationType=AllocatingEvaluation()
)Add preconditioning to a gradient problem.
Input
- M::- AbstractManifold- : a Riemannian manifold $\mathcal M$
- preconditioner: preconditioner function, either as a- (M, p, X)-> Y- allocating or(M, Y, p, X) -> Y` mutating function
Keyword arguments
- evaluation=- AllocatingEvaluation- (): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (- AllocatingEvaluation) or whether they modify their input argument to return the result therein (- InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.
- direction=- IdentityUpdateRuleinternal- DirectionUpdateRuleto determine the gradients to store or a- ManifoldDefaultsFactorygenerating one
Debug actions
Manopt.DebugGradient — TypeDebugGradient <: DebugActiondebug for the gradient evaluated at the current iterate
Constructors
DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)display the short (false) or long (true) default text for the gradient, or set the prefix manually. Alternatively the complete format can be set.
Manopt.DebugGradientNorm — TypeDebugGradientNorm <: DebugActiondebug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])display the short (false) or long (true) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])display the a prefix in front of the gradient norm.
Manopt.DebugStepsize — TypeDebugStepsize <: DebugActiondebug for the current step size.
Constructors
DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)display the a prefix in front of the step size.
Record actions
Manopt.RecordGradient — TypeRecordGradient <: RecordActionrecord the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)initialize the RecordAction to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm — TypeRecordGradientNorm <: RecordActionrecord the norm of the current gradient
Manopt.RecordStepsize — TypeRecordStepsize <: RecordActionrecord the step size
Technical details
The gradient_descent solver requires the following functions of a manifold to be available
- A retract!(M, q, p, X); it is recommended to set thedefault_retraction_methodto a favourite retraction. If this default is set, aretraction_method=does not have to be specified.
- By default gradient descent uses ArmijoLinesearchwhich requiresmax_stepsize(M)to be set and an implementation ofinner(M, p, X).
- By default the stopping criterion uses the normas well, to stop when the norm of the gradient is small, but if you implementedinner, the norm is provided already.
- By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).
Literature
- [Lue72]
- D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
- [ZS18]
- H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).