Difference of convex
Difference of convex algorithm
Manopt.difference_of_convex_algorithm
— Functiondifference_of_convex_algorithm(M, f, g, ∂h, p=rand(M); kwargs...)
difference_of_convex_algorithm(M, mdco, p; kwargs...)
Compute the difference of convex algorithm [BFSS23] to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M}\ g(p) - h(p)\]
where you need to provide $f(p) = g(p) - h(p)$, $g$ and the subdifferential $∂h$ of $h$.
This algorithm performs the following steps given a start point p
= $p^{(0)}$. Then repeat for $k=0,1,\ldots$
- Take $X^{(k)} ∈ ∂h(p^{(k)})$
- Set the next iterate to the solution of the subproblem
\[ p^{(k+1)} ∈ \operatorname*{argmin}_{q ∈ \mathcal M} g(q) - ⟨X^{(k)}, \log_{p^{(k)}}q⟩\]
until the stopping_criterion
is fulfilled.
Optional parameters
evaluation
(AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgrad_f(M, p)
orInplaceEvaluation
formgrad_f!(M, X, x)
gradient
(nothing
) specify $\operatorname{grad} f$, for debug / analysis or enhancingstopping_criterion=
grad_g
(nothing
) specify the gradient ofg
. If specified, a subsolver is automatically set up.initial_vector
(zero_vector(M, p)
) initialise the inner tangent vector to store the subgradient result.stopping_criterion
(StopAfterIteration
(200) |
StopWhenChangeLess
(1e-8)
) aStoppingCriterion
for the algorithm. This includes aStopWhenGradientNormLess
(1e-8)
, when agradient
is provided.
if you specify the ManifoldDifferenceOfConvexObjective
mdco
, additionally
g
- (nothing
) specify the functiong
If specified, a subsolver is automatically set up.
While there are several parameters for a sub solver, the easiest is to provide the function grad_g=
, such that together with the mandatory function g
a default cost and gradient can be generated and passed to a default subsolver. Hence the easiest example call looks like
difference_of_convex_algorithm(M, f, g, grad_h, p; grad_g=grad_g)
Optional parameters for the sub problem
sub_cost
(LinearizedDCCost
(g, p, initial_vector)
) a cost to be used within the defaultsub_problem
Use this if you have a more efficient version than the default that is built usingg
from before.sub_grad
(LinearizedDCGrad
(grad_g, p, initial_vector; evaluation=evaluation)
gradient to be used within the defaultsub_problem
. This is generated by default whengrad_g
is provided. You can specify your own by overwriting this keyword.sub_hess
(a finite difference approximation by default) specify a Hessian of the subproblem, which the default solver, seesub_state
needssub_kwargs
((;)
) pass keyword arguments to thesub_state
, in form of aDict(:kwname=>value)
, unless you set thesub_state
directly.sub_objective
(a gradient or Hessian objective based on the last 3 keywords) provide the objective used withinsub_problem
(if that is not specified by the user)sub_problem
(DefaultManoptProblem
(M, sub_objective)
specify a manopt problem for the sub-solver runs. You can also provide a function for a closed form solution. Thenevaluation=
is taken into account for the form of this function.sub_state
(TrustRegionsState
by default, requiressub_hessian
to be provided; decorated withsub_kwargs
). Choose the solver by specifying a solver state to solve thesub_problem
if thesub_problem
if a function (a closed form solution), this is set toevaluation
and can be changed to the evaluation type of the closed form solution accordingly.sub_stopping_criterion
(StopAfterIteration
(300) |
StopWhenStepsizeLess
(1e-9) |
StopWhenGradientNormLess
(1e-9)
) a stopping criterion used withing the defaultsub_state=
sub_stepsize
(ArmijoLinesearch
(M)
) specify a step size used within thesub_state
,
all others are passed on to decorate the inner DifferenceOfConvexState
.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
Manopt.difference_of_convex_algorithm!
— Functiondifference_of_convex_algorithm!(M, f, g, ∂h, p; kwargs...)
difference_of_convex_algorithm!(M, mdco, p; kwargs...)
Run the difference of convex algorithm and perform the steps in place of p
. See difference_of_convex_algorithm
for more details.
if you specify the ManifoldDifferenceOfConvexObjective
mdco
, the g
is a keyword argument.
Difference of convex proximal point
Manopt.difference_of_convex_proximal_point
— Functiondifference_of_convex_proximal_point(M, grad_h, p=rand(M); kwargs...)
difference_of_convex_proximal_point(M, mdcpo, p=rand(M); kwargs...)
Compute the difference of convex proximal point algorithm [SO15] to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M} g(p) - h(p)\]
where you have to provide the (sub) gradient $∂h$ of $h$ and either
- the proximal map $\operatorname{prox}_{\lambda g}$ of
g
as a functionprox_g(M, λ, p)
orprox_g(M, q, λ, p)
- the functions
g
andgrad_g
to compute the proximal map using a sub solver - your own sub-solver, see optional keywords below
This algorithm performs the following steps given a start point p
= $p^{(0)}$. Then repeat for $k=0,1,\ldots$
- $X^{(k)} ∈ \operatorname{grad} h(p^{(k)})$
- $q^{(k)} = \operatorname{retr}_{p^{(k)}}(λ_kX^{(k)})$
- $r^{(k)} = \operatorname{prox}_{λ_kg}(q^{(k)})$
- $X^{(k)} = \operatorname{retr}^{-1}_{p^{(k)}}(r^{(k)})$
- Compute a stepsize $s_k$ and
- set $p^{(k+1)} = \operatorname{retr}_{p^{(k)}}(s_kX^{(k)})$.
until the stopping_criterion
is fulfilled. See [ACOO20] for more details on the modified variant, where steps 4-6 are slightly changed, since here the classical proximal point method for DC functions is obtained for $s_k = 1$ and one can hence employ usual line search method.
Optional parameters
λ
: (i -> 1/2
) a function returning the sequence of prox parameters λievaluation
: (AllocatingEvaluation
) specify whether the gradient works by allocation (default) formgradF(M, x)
orInplaceEvaluation
in place of the formgradF!(M, X, x)
.cost
: (nothing
) provide the costf
, for debug reasons / analysis the defaultsub_problem
. Use this if you have a more efficient version than usingg
from before.gradient
: (nothing
) specify $\operatorname{grad} f$, for debug / analysis or enhancing thestopping_criterion
prox_g
: (nothing
) specify a proximal map for the sub problem or both of the followingg
: (nothing
) specify the functiong
.grad_g
: (nothing
) specify the gradient ofg
. If bothg
andgrad_g
are specified, a subsolver is automatically set up.inverse_retraction_method
: (default_inverse_retraction_method(M)
) an inverse retraction method to use (see step 4).retraction_method
: (default_retraction_method(M)
) a retraction to use (see step 2)stepsize
: (ConstantStepsize
(M)
) specify aStepsize
to run the modified algorithm (experimental.) functor.stopping_criterion
: (StopAfterIteration
(200) |
StopWhenChangeLess
(1e-8)
) aStoppingCriterion
for the algorithm, also includes aStopWhenGradientNormLess
(1e-8)
, when agradient
is provided.
While there are several parameters for a sub solver, the easiest is to provide the function g
and grad_g
, such that together with the mandatory function g
a default cost and gradient can be generated and passed to a default subsolver. Hence the easiest example call looks like
difference_of_convex_proximal_point(M, grad_h, p0; g=g, grad_g=grad_g)
Optional parameters for the sub problem
sub_cost
: (ProximalDCCost
(g, copy(M, p), λ(1))
) cost to be used within the defaultsub_problem
that is initialized as soon asg
is provided.sub_grad
: (ProximalDCGrad
(grad_g, copy(M, p), λ(1); evaluation=evaluation)
gradient to be used within the defaultsub_problem
, that is initialized as soon asgrad_g
is provided. This is generated by default whengrad_g
is provided. You can specify your own by overwriting this keyword.sub_hess
: (a finite difference approximation by default) specify a Hessian of the subproblem, which the default solver, seesub_state
needssub_kwargs
: ((;)
) pass keyword arguments to thesub_state
, in form of aDict(:kwname=>value)
, unless you set thesub_state
directly.sub_objective
: (a gradient or Hessian objective based on the last 3 keywords) provide the objective used withinsub_problem
(if that is not specified by the user)sub_problem
: (DefaultManoptProblem
(M, sub_objective)
specify a manopt problem for the sub-solver runs. You can also provide a function for a closed form solution. Thenevaluation=
is taken into account for the form of this function.sub_state
: (TrustRegionsState
). requires thesub_Hessian to be provided, decorated with
subkwargs) choose the solver by specifying a solver state to solve the
subproblem`sub_stopping_criterion
: (StopAfterIteration
(300) |
StopWhenStepsizeLess
(1e-9) |
StopWhenGradientNormLess
(1e-9)
) a stopping criterion used withing the defaultsub_state=
all others are passed on to decorate the inner DifferenceOfConvexProximalState
.
Output
the obtained (approximate) minimizer $p^*$, see get_solver_return
for details
Manopt.difference_of_convex_proximal_point!
— Functiondifference_of_convex_proximal_point!(M, grad_h, p; cost=nothing, kwargs...)
difference_of_convex_proximal_point!(M, mdcpo, p; cost=nothing, kwargs...)
difference_of_convex_proximal_point!(M, mdcpo, prox_g, p; cost=nothing, kwargs...)
Compute the difference of convex algorithm to minimize
\[ \operatorname*{arg\,min}_{p∈\mathcal M} g(p) - h(p)\]
where you have to provide the proximal map of g
and the gradient of h
.
The computation is done in-place of p
.
For all further details, especially the keyword arguments, see difference_of_convex_proximal_point
.
Solver states
Manopt.DifferenceOfConvexState
— TypeDifferenceOfConvexState{Pr,St,P,T,SC<:StoppingCriterion} <:
AbstractManoptSolverState
A struct to store the current state of the [difference_of_convex_algorithm
])(@ref). It comes in two forms, depending on the realisation of the subproblem
.
Fields
p
the current iterate, a point on the manifoldX
the current subgradient, a tangent vector top
.sub_problem
problem for the subsolversub_state
state of the subproblemstop
a functor inheriting fromStoppingCriterion
indicating when to stop.
For the sub task, a method to solve
\[ \operatorname*{argmin}_{q∈\mathcal M}\ g(p) - ⟨X, \log_p q⟩\]
is needed. Besides a problem and options, one can also provide a function and an AbstractEvaluationType
, respectively, to indicate a closed form solution for the sub task.
Constructors
DifferenceOfConvexState(M, p, sub_problem, sub_state; kwargs...)
DifferenceOfConvexState(M, p, sub_solver; evaluation=InplaceEvaluation(), kwargs...)
Generate the state either using a solver from Manopt, given by an AbstractManoptProblem
sub_problem
and an AbstractManoptSolverState
sub_state
, or a closed form solution sub_solver
for the sub-problem the function expected to be of the form (M, p, X) -> q
or (M, q, p, X) -> q
, where by default its AbstractEvaluationType
evaluation
is in-place of q
. Here the elements passed are the current iterate p
and the subgradient X
of h
can be passed to that function.
further keyword arguments
initial_vector=zero_vector
(zero_vectoir(M,p)
) how to initialize the inner gradient tangent vectorstopping_criterion
aStopAfterIteration
(200)
a stopping criterion
Manopt.DifferenceOfConvexProximalState
— TypeDifferenceOfConvexProximalState{Type} <: Options
A struct to store the current state of the algorithm as well as the form. It comes in two forms, depending on the realisation of the subproblem
.
Fields
inverse_retraction_method
: (default_inverse_retraction_method(M)
) an inverse retraction method to use within Frank Wolfe.retraction_method
: (default_retraction_method(M)
) a type of retractionp
,q
,r
: the current iterate, the gradient step and the prox, respectively their type is set by initializingp
stepsize
: (ConstantStepsize
(1.0)
) aStepsize
function to run the modified algorithm (experimental)stop
: (StopWhenChangeLess
(1e-8)
) aStoppingCriterion
X
,Y
: (zero_vector(M,p)
) the current gradient and descent direction, respectively their common type is set by the keywordX
Constructor
DifferenceOfConvexProximalState(M, p; kwargs...)
Keyword arguments
X
,retraction_method
,inverse_retraction_method
,stepsize
for the corresponding fieldsstoppping_criterion
for theStoppingCriterion
The difference of convex objective
Manopt.ManifoldDifferenceOfConvexObjective
— TypeManifoldDifferenceOfConvexObjective{E} <: AbstractManifoldCostObjective{E}
Specify an objective for a difference_of_convex_algorithm
.
The objective $f: \mathcal M → ℝ$ is given as
\[ f(p) = g(p) - h(p)\]
where both $g$ and $h$ are convex, lower semicontinuous and proper. Furthermore the subdifferential $∂h$ of $h$ is required.
Fields
cost
: an implementation of $f(p) = g(p)-h(p)$ as a functionf(M,p)
.∂h!!
: a deterministic version of $∂h: \mathcal M → T\mathcal M$, in the sense that calling∂h(M, p)
returns a subgradient of $h$ atp
and if there is more than one, it returns a deterministic choice.
Note that the subdifferential might be given in two possible signatures
∂h(M,p)
which does anAllocatingEvaluation
∂h!(M, X, p)
which does anInplaceEvaluation
in place ofX
.
as well as for the corresponding sub problem
Manopt.LinearizedDCCost
— TypeLinearizedDCCost
A functor (M,q) → ℝ
to represent the inner problem of a ManifoldDifferenceOfConvexObjective
. This is a cost function of the form
\[ F_{p_k,X_k}(p) = g(p) - ⟨X_k, \log_{p_k}p⟩\]
for a point p_k
and a tangent vector X_k
at p_k
(for example outer iterates) that are stored within this functor as well.
Fields
g
a functionpk
a point on a manifoldXk
a tangent vector atpk
Both interim values can be set using set_manopt_parameter!(::LinearizedDCCost, ::Val{:p}, p)
and set_manopt_parameter!(::LinearizedDCCost, ::Val{:X}, X)
, respectively.
Constructor
LinearizedDCCost(g, p, X)
Manopt.LinearizedDCGrad
— TypeLinearizedDCGrad
A functor (M,X,p) → ℝ
to represent the gradient of the inner problem of a ManifoldDifferenceOfConvexObjective
. This is a gradient function of the form
\[ F_{p_k,X_k}(p) = g(p) - ⟨X_k, \log_{p_k}p⟩\]
its gradient is given by using $F=F_1(F_2(p))$, where $F_1(X) = ⟨X_k,X⟩$ and $F_2(p) = \log_{p_k}p$ and the chain rule as well as the adjoint differential of the logarithmic map with respect to its argument for $D^*F_2(p)$
\[ \operatorname{grad} F(q) = \operatorname{grad} f(q) - DF_2^*(q)[X]\]
for a point pk
and a tangent vector Xk
at pk
(the outer iterates) that are stored within this functor as well
Fields
grad_g!!
the gradient of $g$ (see alsoLinearizedDCCost
)pk
a point on a manifoldXk
a tangent vector atpk
Both interim values can be set using set_manopt_parameter!(::LinearizedDCGrad, ::Val{:p}, p)
and set_manopt_parameter!(::LinearizedDCGrad, ::Val{:X}, X)
, respectively.
Constructor
LinearizedDCGrad(grad_g, p, X; evaluation=AllocatingEvaluation())
Where you specify whether grad_g
is AllocatingEvaluation
or InplaceEvaluation
, while this function still provides both signatures.
Manopt.ManifoldDifferenceOfConvexProximalObjective
— TypeManifoldDifferenceOfConvexProximalObjective{E} <: Problem
Specify an objective difference_of_convex_proximal_point
algorithm. The problem is of the form
\[ \operatorname*{argmin}_{p∈\mathcal M} g(p) - h(p)\]
where both $g$ and $h$ are convex, lower semicontinuous and proper.
Fields
cost
: (nothing
) implementation of $f(p) = g(p)-h(p)$ (optional)gradient
: the gradient of the costgrad_h!!
: a function $\operatorname{grad}h: \mathcal M → T\mathcal M$,
Note that both the gradients might be given in two possible signatures as allocating or in-place.
Constructor
ManifoldDifferenceOfConvexProximalObjective(gradh; cost=nothing, gradient=nothing)
an note that neither cost nor gradient are required for the algorithm, just for eventual debug or stopping criteria.
as well as for the corresponding sub problems
Manopt.ProximalDCCost
— TypeProximalDCCost
A functor (M, p) → ℝ
to represent the inner cost function of a ManifoldDifferenceOfConvexProximalObjective
. This is the cost function of the proximal map of g
.
\[ F_{p_k}(p) = \frac{1}{2λ}d_{\mathcal M}(p_k,p)^2 + g(p)\]
for a point pk
and a proximal parameter $λ$.
Fields
g
- a functionpk
- a point on a manifoldλ
- the prox parameter
Both interim values can be set using set_manopt_parameter!(::ProximalDCCost, ::Val{:p}, p)
and set_manopt_parameter!(::ProximalDCCost, ::Val{:λ}, λ)
, respectively.
Constructor
ProximalDCCost(g, p, λ)
Manopt.ProximalDCGrad
— TypeProximalDCGrad
A functor (M,X,p) → ℝ
to represent the gradient of the inner cost function of a ManifoldDifferenceOfConvexProximalObjective
. This is the gradient function of the proximal map cost function of g
. Based on
\[ F_{p_k}(p) = \frac{1}{2λ}d_{\mathcal M}(p_k,p)^2 + g(p)\]
it reads
\[ \operatorname{grad} F_{p_k}(p) = \operatorname{grad} g(p) - \frac{1}{λ}\log_p p_k\]
for a point pk
and a proximal parameter λ
.
Fields
grad_g
- a gradient functionpk
- a point on a manifoldλ
- the prox parameter
Both interim values can be set using set_manopt_parameter!(::ProximalDCGrad, ::Val{:p}, p)
and set_manopt_parameter!(::ProximalDCGrad, ::Val{:λ}, λ)
, respectively.
Constructor
ProximalDCGrad(grad_g, pk, λ; evaluation=AllocatingEvaluation())
Where you specify whether grad_g
is AllocatingEvaluation
or InplaceEvaluation
, while this function still always provides both signatures.
Helper functions
Manopt.get_subtrahend_gradient
— FunctionX = get_subtrahend_gradient(amp, q)
get_subtrahend_gradient!(amp, X, q)
Evaluate the (sub)gradient of the subtrahend h
from within a ManifoldDifferenceOfConvexObjective
amp
at the point q
(in place of X
).
The evaluation is done in place of X
for the !
-variant. The T=
AllocatingEvaluation
problem might still allocate memory within. When the non-mutating variant is called with a T=
InplaceEvaluation
memory for the result is allocated.
X = get_subtrahend_gradient(M::AbstractManifold, dcpo::ManifoldDifferenceOfConvexProximalObjective, p)
get_subtrahend_gradient!(M::AbstractManifold, X, dcpo::ManifoldDifferenceOfConvexProximalObjective, p)
Evaluate the gradient of the subtrahend $h$ from within a ManifoldDifferenceOfConvexProximalObjective
P
at the point
p` (in place of X).
Technical details
The difference_of_convex_algorithm
and difference_of_convex_proximal_point
solver requires the following functions of a manifold to be available
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
orretraction_method_dual=
(for $\mathcal N$) does not have to be specified. - An
inverse_retract!
(M, X, p, q)
; it is recommended to set thedefault_inverse_retraction_method
to a favourite retraction. If this default is set, ainverse_retraction_method=
orinverse_retraction_method_dual=
(for $\mathcal N$) does not have to be specified.
By default, one of the stopping criteria is StopWhenChangeLess
, which either requires
- A
retract!
(M, q, p, X)
; it is recommended to set thedefault_retraction_method
to a favourite retraction. If this default is set, aretraction_method=
orretraction_method_dual=
(for $\mathcal N$) does not have to be specified. - An
inverse_retract!
(M, X, p, q)
; it is recommended to set thedefault_inverse_retraction_method
to a favourite retraction. If this default is set, ainverse_retraction_method=
orinverse_retraction_method_dual=
(for $\mathcal N$) does not have to be specified or thedistance
(M, p, q)
for said default inverse retraction. - A `copyto!
(M, q, p)
andcopy
(M,p)
for points. - By default the tangent vector storing the gradient is initialized calling
zero_vector
(M,p)
. - everything the subsolver requires, which by default is the
trust_regions
or if you do not provide a Hessiangradient_descent
.
Literature
- [ACOO20]
- Y. T. Almeida, J. X. Cruz Neto, P. R. Oliveira and J. C. Oliveira Souza. A modified proximal point method for DC functions on Hadamard manifolds. Computational Optimization and Applications 76, 649–673 (2020).
- [BFSS23]
- R. Bergmann, O. P. Ferreira, E. M. Santos and J. C. Souza. The difference of convex algorithm on Hadamard manifolds, arXiv preprint (2023).
- [SO15]
- J. C. Souza and P. R. Oliveira. A proximal point algorithm for DC fuctions on Hadamard manifolds. Journal of Global Optimization 63, 797–810 (2015).