Gradient descent

Manopt.gradient_descent — Function

gradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)

perform the gradient descent algorithm

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]

where $s_k > 0$ denotes a step size.

The algorithm can be performed in-place of p.

Input

M::AbstractManifold: a Riemannian manifold $\mathcal M$
f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
p: a point on the manifold $\mathcal M$

Alternatively to f and grad_f you can provide the corresponding AbstractManifoldFirstOrderObjectivegradient_objective directly.

Keyword arguments

differential=nothing: specify a specific function to evaluate the differential. By default, $Df(p)[X] = ⟨\operatorname{grad}f(p),X⟩$. is used
direction=IdentityUpdateRule(): specify to perform a certain processing of the direction, for example Nesterov, MomentumGradient or AverageGradient.
evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.For example grad_f(M,p) allocates, but grad_f!(M, X, p) computes the result in-place of X.
retraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
stepsize=default_stepsize(M, GradientDescentState): a functor inheriting from Stepsize to determine a step size
stopping_criterion=StopAfterIteration(200)|StopWhenGradientNormLess(1e-8): a functor indicating that the stopping criterion is fulfilled
X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.

If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

If you activate tutorial mode (cf. is_tutorial_mode), this solver provides additional debug warnings.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source

Manopt.gradient_descent! — Function

gradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
gradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)

perform the gradient descent algorithm

\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]

where $s_k > 0$ denotes a step size.

The algorithm can be performed in-place of p.

Input

M::AbstractManifold: a Riemannian manifold $\mathcal M$
f: a cost function $f: \mathcal M→ ℝ$ implemented as (M, p) -> v
grad_f: the (Riemannian) gradient $\operatorname{grad}f: \mathcal M → T_{p}\mathcal M$ of f as a function (M, p) -> X or a function (M, X, p) -> X computing X in-place
p: a point on the manifold $\mathcal M$

Alternatively to f and grad_f you can provide the corresponding AbstractManifoldFirstOrderObjectivegradient_objective directly.

Keyword arguments

differential=nothing: specify a specific function to evaluate the differential. By default, $Df(p)[X] = ⟨\operatorname{grad}f(p),X⟩$. is used
direction=IdentityUpdateRule(): specify to perform a certain processing of the direction, for example Nesterov, MomentumGradient or AverageGradient.
evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.For example grad_f(M,p) allocates, but grad_f!(M, X, p) computes the result in-place of X.
retraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
stepsize=default_stepsize(M, GradientDescentState): a functor inheriting from Stepsize to determine a step size
stopping_criterion=StopAfterIteration(200)|StopWhenGradientNormLess(1e-8): a functor indicating that the stopping criterion is fulfilled
X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate

All other keyword arguments are passed to decorate_state! for state decorators or decorate_objective! for objective decorators, respectively.

If you provide the ManifoldFirstOrderObjective directly, the evaluation= keyword is ignored. The decorations are still applied to the objective.

If you activate tutorial mode (cf. is_tutorial_mode), this solver provides additional debug warnings.

Output

The obtained approximate minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return for details, especially the return_state= keyword.

source

State

Manopt.GradientDescentState — Type

GradientDescentState{P,T} <: AbstractGradientSolverState

Describes the state of a gradient based descent algorithm.

Fields

p::P: a point on the manifold $\mathcal M$ storing the current iterate
X::T: a tangent vector at the point $p$ on the manifold $\mathcal M$storing the gradient at the current iterate
stop::StoppingCriterion: a functor indicating that the stopping criterion is fulfilled
stepsize::Stepsize: a functor inheriting from Stepsize to determine a step size
direction::DirectionUpdateRule : a processor to handle the obtained gradient and compute a direction to “walk into”.
retraction_method::AbstractRetractionMethod: a retraction $\operatorname{retr}$ to use, see the section on retractions

Constructor

GradientDescentState(M::AbstractManifold; kwargs...)

Initialize the gradient descent solver state, where

Input

M::AbstractManifold: a Riemannian manifold $\mathcal M$

Keyword arguments

direction=IdentityUpdateRule()
p=rand(M): a point on the manifold $\mathcal M$ to specify the initial value
stopping_criterion=StopAfterIteration(100): a functor indicating that the stopping criterion is fulfilled
stepsize=default_stepsize(M, GradientDescentState; retraction_method=retraction_method): a functor inheriting from Stepsize to determine a step size
retraction_method=default_retraction_method(M, typeof(p)): a retraction $\operatorname{retr}$ to use, see the section on retractions
X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$to specify the representation of a tangent vector

See also

gradient_descent

source

Direction update rules

A field of the options is the direction, a DirectionUpdateRule, which by default IdentityUpdateRule just evaluates the gradient but can be enhanced for example to

Manopt.AverageGradient — Function

AverageGradient(; kwargs...)
AverageGradient(M::AbstractManifold; kwargs...)

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.

Input

M::AbstractManifold: a Riemannian manifold $\mathcal M$ (optional)

Keyword arguments

p=rand(M): a point on the manifold $\mathcal M$ to specify the initial value
direction=IdentityUpdateRule preprocess the actual gradient before adding momentum
gradients=[zero_vector(M, p) for _ in 1:n] how to initialise the internal storage
n=10 number of gradient evaluations to take the mean over
X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$
vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Info

This function generates a ManifoldDefaultsFactory for AverageGradientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

Manopt.DirectionUpdateRule — Type

DirectionUpdateRule

A general functor, that handles direction update rules. It's fields are usually only a StoreStateAction by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.

source

Manopt.IdentityUpdateRule — Type

IdentityUpdateRule <: DirectionUpdateRule

The default gradient direction update is the identity, usually it just evaluates the gradient.

You can also use Gradient() to create the corresponding factory, though this only delays this parameter-free instantiation to later.

source

Manopt.MomentumGradient — Function

MomentumGradient()

Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.

Input

M (optional)

Keyword arguments

p=rand(M): a point on the manifold $\mathcal M$
direction=IdentityUpdateRule preprocess the actual gradient before adding momentum
X=zero_vector(M, p): a tangent vector at the point $p$ on the manifold $\mathcal M$
momentum=0.2 amount of momentum to use
vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Info

This function generates a ManifoldDefaultsFactory for MomentumGradientRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

Manopt.Nesterov — Function

Nesterov(; kwargs...)
Nesterov(M::AbstractManifold; kwargs...)

Assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given

a step size $h_k<\frac{1}{L}$ (from the GradientDescentState
a shrinkage parameter $β_k$
and a current iterate $p_k$
as well as the interim values $γ_k$ and $v_k$ from the previous iterate.

This compute a Nesterov type update using the following steps, see [ZS18]

Compute the positive root $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
Set $\barγ_k+1 = (1-α_k)γ_k + α_kμ$
$y_k = \operatorname{retr}_{p_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{p_k}v_k \Bigr)$
$x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
$v_{k+1} = \operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\barγ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
$γ_{k+1} = \frac{1}{1+β_k}\barγ_{k+1}$

Then the direction from $p_k$ to $p_k+1$ by $d = \operatorname{retr}^{-1}_{p_k}p_{k+1}$ is returned.

Input

M::AbstractManifold: a Riemannian manifold $\mathcal M$ (optional)

Keyword arguments

p=rand(M): a point on the manifold $\mathcal M$ to specify the initial value
γ=0.001
μ=0.9
shrinkage = k -> 0.8
inverse_retraction_method=default_inverse_retraction_method(M, typeof(p)): an inverse retraction $\operatorname{retr}^{-1}$ to use, see the section on retractions and their inverses

Info

This function generates a ManifoldDefaultsFactory for NesterovRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

Manopt.PreconditionedDirection — Function

PreconditionedDirection(preconditioner; kwargs...)
PreconditionedDirection(M::AbstractManifold, preconditioner; kwargs...)

Add a preconditioner to a gradient processor following the motivation for optimization, as a linear invertible map $P: T_{p}\mathcal M → T_{p}\mathcal M$ that usually should be

symmetric: $⟨X, P(Y)⟩ = ⟨P(X), Y⟩$
positive definite $⟨X, P(X)⟩ > 0$ for $X$ not the zero-vector

The gradient is then preconditioned as $P(X)$, where $X$ is either the gradient of the objective or the result of a previous (internally stored) gradient processor.

For example if you provide as the preconditioner the inverse of the Hessian $\operatorname{Hess}^{-1} f$, you turn a gradient descent into a Newton method.

Arguments

M::AbstractManifold: a Riemannian manifold $\mathcal M$ (optional)
preconditioner: preconditioner function, either as a (M, p, X) -> Y allocating or (M, Y, p, X) -> Y mutating function

Keyword arguments

direction=IdentityUpdateRule internal DirectionUpdateRule to determine the gradients to store or a ManifoldDefaultsFactory generating one
evaluation=AllocatingEvaluation(): specify whether the functions that return an array, for example a point or a tangent vector, work by allocating its result (AllocatingEvaluation) or whether they modify their input argument to return the result therein (InplaceEvaluation). Since usually the first argument is the manifold, the modified argument is the second.

Info

This function generates a ManifoldDefaultsFactory for PreconditionedDirectionRule. For default values, that depend on the manifold, this factory postpones the construction until the manifold from for example a corresponding AbstractManoptSolverState is available.

source

which internally use the ManifoldDefaultsFactory and produce the internal elements

Manopt.AverageGradientRule — Type

AverageGradientRule <: DirectionUpdateRule

Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored. The average is taken after vector transporting them to the current iterates tangent space.

Fields

gradients: the last n gradient/direction updates
last_iterate: last iterate (needed to transport the gradients)
direction: internal DirectionUpdateRule to determine directions to apply the averaging to
vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

Constructors

AverageGradientRule(
    M::AbstractManifold;
    p::P=rand(M);
    n::Int=10
    direction::Union{<:DirectionUpdateRule,ManifoldDefaultsFactory}=IdentityUpdateRule(),
    gradients = fill(zero_vector(p.M, o.x),n),
    last_iterate = deepcopy(x0),
    vector_transport_method = default_vector_transport_method(M, typeof(p))
)

Add average to a gradient problem, where

n: determines the size of averaging
direction: is the internal DirectionUpdateRule to determine the gradients to store
gradients: can be pre-filled with some history
last_iterate: stores the last iterate
vector_transport_method=default_vector_transport_method(M, typeof(p)): a vector transport $\mathcal T_{⋅←⋅}$ to use, see the section on vector transports

source

Manopt.ConjugateDescentCoefficientRule — Type

ConjugateDescentCoefficientRule <: DirectionUpdateRule

A functor (problem, state, k) -> β_k to compute the conjugate gradient update coefficient adapted to manifolds

Debug actions

Manopt.DebugGradient — Type

DebugGradient <: DebugAction

debug for the gradient evaluated at the current iterate

Constructors

DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)

display the short (false) or long (true) default text for the gradient, or set the prefix manually. Alternatively the complete format can be set.

source

Manopt.DebugGradientNorm — Type

DebugGradientNorm <: DebugAction

debug for gradient evaluated at the current iterate.

Constructors

DebugGradientNorm([long=false,p=print])

display the short (false) or long (true) default text for the gradient norm.

DebugGradientNorm(prefix[, p=print])

display the a prefix in front of the gradient norm.

source

Manopt.DebugStepsize — Type

DebugStepsize <: DebugAction

debug for the current step size.

Constructors

DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)

display the a prefix in front of the step size.

source

Record actions

Manopt.RecordGradient — Type

RecordGradient <: RecordAction

record the gradient evaluated at the current iterate

Constructors

RecordGradient(ξ)

initialize the RecordAction to the corresponding type of the tangent vector.

source

Manopt.RecordGradientNorm — Type

RecordGradientNorm <: RecordAction

record the norm of the current gradient

source

Manopt.RecordStepsize — Type

RecordStepsize <: RecordAction

record the step size

source

Technical details

The gradient_descent solver requires the following functions of a manifold to be available

A retract!(M, q, p, X); it is recommended to set the default_retraction_method to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
By default gradient descent uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of inner(M, p, X).
By default the stopping criterion uses the norm as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
By default the tangent vector storing the gradient is initialized calling zero_vector(M,p).

Literature

[Lue72]: D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
[ZS18]: H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).