Gradient descent
Manopt.gradient_descent
— Functiongradient_descent(M, f, grad_f, p=rand(M); kwargs...)
gradient_descent(M, gradient_objective, p=rand(M); kwargs...)
perform a gradient descent
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr), \qquad k=0,1,…\]
with different choices of the stepsize $s_k$ available (see stepsize
option below).
Input
M
a manifold $\mathcal M$f
a cost function $f: \mathcal M→ℝ$ to find a minimizer $p^*$ forgrad_f
the gradient $\operatorname{grad}f: \mathcal M → T\mathcal M$ of f as a function(M, p) -> X
or a function(M, X, p) -> X
p
an initial valuep
$= p_0 ∈ \mathcal M$
Alternatively to f
and grad_f
you can provide the AbstractManifoldGradientObjective
gradient_objective
directly.
Optional
direction
: (IdentityUpdateRule
) perform a processing of the direction, e.g.evaluation
: (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
in place of the formgrad_f!(M, X, p)
.retraction_method
: (default_retraction_method
(M, typeof(p))
) a retraction to usestepsize
: (default_stepsize
(M, GradientDescentState)
) aStepsize
stopping_criterion
: (StopAfterIteration
(200) |
StopWhenGradientNormLess
(1e-8)
) a functor inheriting fromStoppingCriterion
indicating when to stop.X
: ([zero_vector(M,p)
]) provide memory and/or type of the gradient to use`
If you provide the ManifoldGradientObjective
directly, evaluation
is ignored.
All other keyword arguments are passed to decorate_state!
for state decorators or decorate_objective!
for objective, respectively. If you provide the ManifoldGradientObjective
directly, these decorations can still be specified
Output
the obtained (approximate) minimizer $p^*$. To obtain the whole final state of the solver, see get_solver_return
for details
Manopt.gradient_descent!
— Functiongradient_descent!(M, f, grad_f, p; kwargs...)
gradient_descent!(M, gradient_objective, p; kwargs...)
perform a Gradient descent in-place of p
\[p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr)\]
in place of p
with different choices of $s_k$ available.
Input
M
a manifold $\mathcal M$f
a cost function $F:\mathcal M→ℝ$ to minimizegrad_f
the gradient $\operatorname{grad}F:\mathcal M→ T\mathcal M$ of Fp
an initial value $p ∈ \mathcal M$
Alternatively to f
and grad_f
you can provide the AbstractManifoldGradientObjective
gradient_objective
directly.
For more options, especially Stepsize
s for $s_k$, see gradient_descent
State
Manopt.GradientDescentState
— TypeGradientDescentState{P,T} <: AbstractGradientSolverState
Describes a Gradient based descent algorithm, with
Fields
a default value is given in brackets if a parameter can be left out in initialization.
p
: (rand(M)
the current iterateX
: (zero_vector(M,p)
) the current gradient $\operatorname{grad}f(p)$, initialised to zero vector.stopping_criterion
: (StopAfterIteration
(100)
) aStoppingCriterion
stepsize
: (default_stepsize
(M, GradientDescentState)
) aStepsize
direction
: (IdentityUpdateRule
) a processor to compute the gradientretraction_method
: (default_retraction_method(M, typeof(p))
) the retraction to use, defaults to the default set for your manifold.
Constructor
GradientDescentState(M, p=rand(M); X=zero_vector(M, p), kwargs...)
Generate gradient descent options, where X
can be used to set the tangent vector to store the gradient in a certain type. All other fields are keyword arguments.
See also
Direction update rules
A field of the options is the direction
, a DirectionUpdateRule
, which by default IdentityUpdateRule
just evaluates the gradient but can be enhanced for example to
Manopt.DirectionUpdateRule
— TypeDirectionUpdateRule
A general functor, that handles direction update rules. It's fields are usually only a StoreStateAction
by default initialized to the fields required for the specific coefficient, but can also be replaced by a (common, global) individual one that provides these values.
Manopt.IdentityUpdateRule
— TypeIdentityUpdateRule <: DirectionUpdateRule
The default gradient direction update is the identity, usually it just evaluates the gradient.
Manopt.MomentumGradient
— TypeMomentumGradient <: DirectionUpdateRule
Append a momentum to a gradient processor, where the last direction and last iterate are stored and the new is composed as $η_i = m*η_{i-1}' - s d_i$, where $sd_i$ is the current (inner) direction and $η_{i-1}'$ is the vector transported last direction multiplied by momentum $m$.
Fields
p_old
: (rand(M)
) remember the last iterate for parallel transporting the last directionmomentum
: (0.2
) factor for momentumdirection
: internalDirectionUpdateRule
to determine directions to add the momentum to.vector_transport_method
: (default_vector_transport_method(M, typeof(p))
) vector transport method to useX_old
: (zero_vector(M,x0)
) the last gradient/direction update added as momentum
Constructors
Add momentum to a gradient problem, where by default just a gradient evaluation is used
MomentumGradient(
M::AbstractManifold;
p=rand(M),
s::DirectionUpdateRule=IdentityUpdateRule();
X=zero_vector(p.M, x0), momentum=0.2
vector_transport_method=default_vector_transport_method(M, typeof(p)),
)
Initialize a momentum gradient rule to s
, where p
and X
are memory for interim values.
Manopt.AverageGradient
— TypeAverageGradient <: DirectionUpdateRule
Add an average of gradients to a gradient processor. A set of previous directions (from the inner processor) and the last iterate are stored, average is taken after vector transporting them to the current iterates tangent space.
Fields
gradients
: the lastn
gradient/direction updateslast_iterate
: last iterate (needed to transport the gradients)direction
: internalDirectionUpdateRule
to determine directions to apply the averaging tovector_transport_method
: vector transport method to use
Constructors
AverageGradient(
M::AbstractManifold,
p::P=rand(M);
n::Int=10
s::DirectionUpdateRule=IdentityUpdateRule();
gradients = fill(zero_vector(p.M, o.x),n),
last_iterate = deepcopy(x0),
vector_transport_method = default_vector_transport_method(M, typeof(p))
)
Add average to a gradient problem, where
n
: determines the size of averagings
: is the internalDirectionUpdateRule
to determine the gradients to storegradients
: can be pre-filled with some historylast_iterate
: stores the last iteratevector_transport_method
: determines how to transport all gradients to the current iterates tangent space before averaging
Manopt.Nesterov
— TypeNesterov <: DirectionUpdateRule
Fields
γ
μ
the strong convexity coefficientv
(=$=v_k$, $v_0=x_0$) an interim point to compute the next gradient evaluation pointy_k
shrinkage
(= i -> 0.8
) a function to compute the shrinkage $β_k$ per iterate.
Assume $f$ is $L$-Lipschitz and $μ$-strongly convex. Given
- a step size $h_k<\frac{1}{L}$ (from the
GradientDescentState
- a
shrinkage
parameter $β_k$ - and a current iterate $x_k$
- as well as the interim values $γ_k$ and $v_k$ from the previous iterate.
This compute a Nesterov type update using the following steps, see [ZS18]
- Compute the positive root $α_k∈(0,1)$ of $α^2 = h_k\bigl((1-α_k)γ_k+α_k μ\bigr)$.
- Set $\bar γ_k+1 = (1-α_k)γ_k + α_kμ$
- $y_k = \operatorname{retr}_{x_k}\Bigl(\frac{α_kγ_k}{γ_k + α_kμ}\operatorname{retr}^{-1}_{x_k}v_k \Bigr)$
- $x_{k+1} = \operatorname{retr}_{y_k}(-h_k \operatorname{grad}f(y_k))$
- $v_{k+1} = `\operatorname{retr}_{y_k}\Bigl(\frac{(1-α_k)γ_k}{\barγ_k}\operatorname{retr}_{y_k}^{-1}(v_k) - \frac{α_k}{\bar γ_{k+1}}\operatorname{grad}f(y_k) \Bigr)$
- $γ_{k+1} = \frac{1}{1+β_k}\bar γ_{k+1}$
Then the direction from $x_k$ to $x_k+1$ by $d = \operatorname{retr}^{-1}_{x_k}x_{k+1}$ is returned.
Constructor
Nesterov(M::AbstractManifold, p::P; γ=0.001, μ=0.9, shrinkage = k -> 0.8;
inverse_retraction_method=LogarithmicInverseRetraction())
Initialize the Nesterov acceleration, where x0
initializes v
.
Debug actions
Manopt.DebugGradient
— TypeDebugGradient <: DebugAction
debug for the gradient evaluated at the current iterate
Constructors
DebugGradient(; long=false, prefix= , format= "$prefix%s", io=stdout)
display the short (false
) or long (true
) default text for the gradient, or set the prefix
manually. Alternatively the complete format can be set.
Manopt.DebugGradientNorm
— TypeDebugGradientNorm <: DebugAction
debug for gradient evaluated at the current iterate.
Constructors
DebugGradientNorm([long=false,p=print])
display the short (false
) or long (true
) default text for the gradient norm.
DebugGradientNorm(prefix[, p=print])
display the a prefix
in front of the gradient norm.
Manopt.DebugStepsize
— TypeDebugStepsize <: DebugAction
debug for the current step size.
Constructors
DebugStepsize(;long=false,prefix="step size:", format="$prefix%s", io=stdout)
display the a prefix
in front of the step size.
Record actions
Manopt.RecordGradient
— TypeRecordGradient <: RecordAction
record the gradient evaluated at the current iterate
Constructors
RecordGradient(ξ)
initialize the RecordAction
to the corresponding type of the tangent vector.
Manopt.RecordGradientNorm
— TypeRecordGradientNorm <: RecordAction
record the norm of the current gradient
Manopt.RecordStepsize
— TypeRecordStepsize <: RecordAction
record the step size
Technical details
The gradient_descent
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
does not have to be specified. - By default gradient descent uses
ArmijoLinesearch
which requiresmax_stepsize
(M)
to be set and an implementation ofinner
(M, p, X)
. - By default the stopping criterion uses the
norm
as well, to stop when the norm of the gradient is small, but if you implementedinner
, the norm is provided already. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
.
Literature
- [Lue72]
- D. G. Luenberger. The gradient projection method along geodesics. Management Science 18, 620–631 (1972).
- [ZS18]
- H. Zhang and S. Sra. Towards Riemannian accelerated gradient methods, arXiv Preprint, 1806.02812 (2018).