Solving Rosenbrock with the Difference of Convex Algorithm

Ronny Bergmann 2023-06-06

Introduction

This example illustrates how the 📖 Rosenbrock problem can be rephrased as a difference of convex problem and with a new metric on Euclidean space. This example is the code that produces the results in [BFSS23], Section 7.2.

Both the Rosenbrock problem

\[ \operatorname*{argmin}_{x\in ℝ^2} a\bigl( x_1^2-x_2\bigr)^2 + \bigl(x_1-b\bigr)^2,\]

where $a,b>0$ and usually $b=1$ and $a \gg b$, we know the minimizer $x^* = (b,b^2)^\mathrm{T}$, and also the (Euclidean) gradient

\[\nabla f(x) = \begin{pmatrix} 4a(x_1^2-x_2)\\ -2a(x_1^2-x_2) \end{pmatrix} + \begin{pmatrix} 2(x_1-b)\\ 0 \end{pmatrix}.\]

They are even available already here in ManifoldExamples.jl, see RosenbrockCost and RosenbrockGradient!!.

Furthermore, the RosenbrockMetric can be used on $ℝ^2$, that is

\[⟨X,Y⟩_{\mathrm{Rb},p} = X^\mathrm{T}G_pY, \qquad G_p = \begin{pmatrix} 1+4p_1^2 & -2p_1 \\ -2p_1 & 1 \end{pmatrix},\]

In this example we want to explore four different approaches to minimizing the Rosenbrock example, that are all based on first-order methods, i.e. using a gradient but not a Hessian.

  1. The Euclidean Gradient
  2. The Riemannian gradient descent with respect to the RosenbrockMetric
  3. The Euclidean Difference of Convex Algorithm
  4. The Riemannian Difference of Convex Algorithm respect to the RosenbrockMetric

Where we obtain a difference of convex problem by writing

\[f(x) = a\bigl( x_1^2-x_2\bigr)^2 + \bigl(x_1-b\bigr)^2 = a\bigl( x_1^2-x_2\bigr)^2 + 2\bigl(x_1-b\bigr)^2 - \bigl(x_1-b\bigr)^2\]

that is

\[g(x) = a\bigl( x_1^2-x_2\bigr)^2 + 2\bigl(x_1-b\bigr)^2 \quad\text{ and }\quad h(x) = \bigl(x_1-b\bigr)^2\]

using LinearAlgebra, Random, Statistics
using Manifolds, Manopt, ManoptExamples
using NamedColors, Plots
import Manopt: set_manopt_parameter!
Random.seed!(42)
paul_tol = load_paul_tol()
indigo = paul_tol["mutedindigo"]
green = paul_tol["mutedgreen"]
sand = paul_tol["mutedsand"]
teal = paul_tol["mutedteal"]
grey = paul_tol["mutedgrey"]

To emphasize the effect, we choose a quite large value of a.

a = 2*10^5
b = 1

and use the starting point and a direction to check gradients

p0 = [0.1, 0.2]

The Euclidean Gradient Descent.

For the Euclidean gradient we can just use the same approach as in the Rosenbrock example

M = ℝ^2
f = ManoptExamples.RosenbrockCost(M; a=a, b=b)
∇f!! = ManoptExamples.RosenbrockGradient!!(M; a=a, b=b)

define a common debug vector

debug_vec = [
        (:Iteration, "# %-8d "),
        (:Cost, "F(x): %1.4e"),
        " ",
        (:Change, "|δp|: %1.4e | "),
        (:GradientNorm, "|grad f|: %1.6e"),
        :Stop,
        "\n",
    ]

and call the gradient descent algorithm

Eucl_GD_state = gradient_descent(M, f, ∇f!!, p0;
    evaluation=InplaceEvaluation(),
    debug=[debug_vec...,10^7],
    stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
    record=[:Iteration, :Cost],
    return_state=true,
)
Initial F(x): 7.2208e+03 
# 10000000 F(x): 8.9937e-06 |δp|: 1.3835e+00 | |grad f|: 8.170355e-03
# 20000000 F(x): 2.9474e-09 |δp|: 6.5764e-03 | |grad f|: 1.419191e-04
# 30000000 F(x): 9.8376e-13 |δp|: 1.1918e-04 | |grad f|: 2.526295e-06
# 40000000 F(x): 3.2830e-16 |δp|: 2.1773e-06 | |grad f|: 4.526313e-08
# 50000000 F(x): 1.0154e-19 |δp|: 3.9803e-08 | |grad f|: 6.838240e-10
At iteration 53073227 the algorithm performed a step with a change (0.0) less than 1.0e-16.

# Solver state for `Manopt.jl`s Gradient Descent
After 53073227 iterations

## Parameters
* retraction method: ExponentialRetraction()

## Stepsize
ArmijoLinesearch() with keyword parameters
  * initial_stepsize    = 1.0
  * retraction_method   = ExponentialRetraction()
  * contraction_factor  = 0.95
  * sufficient_decrease = 0.1

## Stopping criterion

Stop When _one_ of the following are fulfilled:
    Max Iteration 100000000:    not reached
    |Δp| < 1.0e-16: reached
Overall: reached
This indicates convergence: Yes

## Debug
    :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 10000000]
    :Stop = :Stop

## Record
(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)

The Riemannian Gradient Descent.

For the Riemannian case, we define

M_rb = MetricManifold(M, ManoptExamples.RosenbrockMetric())
MetricManifold(Euclidean(2; field=ℝ), ManoptExamples.RosenbrockMetric())

and the gradient is now adopted to the new metric

function grad_f!(M, X, p)
    ∇f!!(M, X, p)
    riemannian_gradient!(M, X, p, X)
    return X
end
function grad_f(M, p)
    X = zero_vector(M, p)
    return grad_f!(M, X, p)
end
R_GD_state = gradient_descent(M_rb, f, grad_f!, p0;
    evaluation=InplaceEvaluation(),
    debug=[debug_vec...,10^6],
    stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
    record=[:Iteration, :Cost],
    return_state=true,
)
Initial F(x): 7.2208e+03 
# 1000000  F(x): 1.3571e-09 |δp|: 9.1006e-01 | |grad f|: 1.974939e-04
# 2000000  F(x): 2.7921e-18 |δp|: 3.6836e-05 | |grad f|: 9.240792e-09
At iteration 2443750 the algorithm performed a step with a change (0.0) less than 1.0e-16.

# Solver state for `Manopt.jl`s Gradient Descent
After 2443750 iterations

## Parameters
* retraction method: ExponentialRetraction()

## Stepsize
ArmijoLinesearch() with keyword parameters
  * initial_stepsize    = 1.0
  * retraction_method   = ExponentialRetraction()
  * contraction_factor  = 0.95
  * sufficient_decrease = 0.1

## Stopping criterion

Stop When _one_ of the following are fulfilled:
    Max Iteration 100000000:    not reached
    |Δp| < 1.0e-16: reached
Overall: reached
This indicates convergence: Yes

## Debug
    :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 1000000]
    :Stop = :Stop

## Record
(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)

The Euclidean Difference of Convex

For the convex case, we have to first introduce the two parts of the cost.

f1(M, p; a=100, b=1) = a * (p[1]^2 - p[2])^2;
f2(M, p; a=100, b=1) = (p[1] - b[1])^2;
g(M, p; a=100, b=1) = f1(M, p; a=a, b=b) + 2 * f2(M, p; a=a, b=b)
h(M, p; a=100, b=1) = f2(M, p; a=a, b=b)

and their (Euclidan) gradients

function ∇h!(M, X, p; a=100, b=1)
    X[1] = 2*(p[1]-b)
    X[2] = 0
    return X
end
function ∇h(M, p; a=100, b=1)
    X = zero(p)
    ∇h!(M, X, p; a=a, b=b)
    return X
end
function ∇g!(M, X, p; a=100, b=1)
    X[1] = 4*a*(p[1]^2-p[2])*p[1] + 2*2*(p[1]-b)
    X[2] = -2*a*(p[1]^2-p[2])
    return X
end
function ∇g(M, p; a=100, b=1)
    X = zero(p)
    ∇g!(M, X, p; a=a, b=b)
    return X
end

and we define for convenience

docE_g(M, p) = g(M, p; a=a, b=b)
docE_f(M,p) = docE_g(M,p) - h(M, p; a=a, b=b)
docE_∇h!(M, X, p) = ∇h!(M, X, p; a=a, b=b)
docE_∇g!(M, X, p) = ∇g!(M, X, p; a=a, b=b)
function docE_∇f!(M, X, p)
  Y = zero_vector(M, p)
  docE_∇g!(M, X, p)
  docE_∇h!(M, Y, p)
  X .-= Y
  return X
end

Then we call the difference of convex algorithm on Eucldiean space $ℝ^2$.

E_doc_state = difference_of_convex_algorithm(
    M, docE_f, docE_g, docE_∇h!, p0;
    gradient=docE_∇f!,
    grad_g = docE_∇g!,
    debug=[debug_vec..., 10^4],
    evaluation=InplaceEvaluation(),
    record=[:Iteration, :Cost],
    stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
    sub_hess=nothing, # Use gradient descent
    sub_stopping_criterion=StopAfterIteration(2000) | StopWhenGradientNormLess(1e-16),
    return_state=true,
)
Initial F(x): 7.2208e+03 
# 10000    F(x): 2.9705e-09 |δp|: 1.3270e+00 | |grad f|: 1.388203e-04
# 20000    F(x): 3.3302e-16 |δp|: 1.2173e-04 | |grad f|: 4.541087e-08
At iteration 26549 the algorithm performed a step with a change (0.0) less than 1.0e-16.

# Solver state for `Manopt.jl`s Difference of Convex Algorithm
After 26549 iterations

## Parameters
* sub solver state:
    | # Solver state for `Manopt.jl`s Gradient Descent
    | After 2000 iterations
    | 
    | ## Parameters
    | * retraction method: ExponentialRetraction()
    | 
    | ## Stepsize
    | ArmijoLinesearch() with keyword parameters
    |   * initial_stepsize    = 1.0
    |   * retraction_method   = ExponentialRetraction()
    |   * contraction_factor  = 0.95
    |   * sufficient_decrease = 0.1
    | 
    | ## Stopping criterion
    | 
    | Stop When _one_ of the following are fulfilled:
    |     Max Iteration 2000:   reached
    |     |grad f| < 1.0e-16: not reached
    | Overall: reached
    | This indicates convergence: No

## Stopping criterion

Stop When _one_ of the following are fulfilled:
    Max Iteration 100000000:    not reached
    |Δp| < 1.0e-16: reached
Overall: reached
This indicates convergence: Yes

## Debug
    :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 10000]
    :Stop = :Stop

## Record
(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)

The Riemannian Difference of Convex

We first have to again defined the gradients with respect to the new metric

function grad_h!(M, X, p; a=100, b=1)
    ∇h!(M, X, p; a=a, b=b)
    riemannian_gradient!(M, X, p, X)
    return X
end
function grad_h(M, p; a=100, b=1)
    X = zero(p)
    grad_h!(M, X, p; a=a, b=b)
    return X
end
function grad_g!(M, X, p; a=100, b=1)
    ∇g!(M, X, p; a=a,b=b)
    riemannian_gradient!(M, X, p, X)
    return X
end
function grad_g(M, p; a=100, b=1)
    X = zero(p)
    grad_g!(M, X, p; a=a, b=b)
    return X
end

While the cost of the subgradient can be infered automaticallty, we also have to provide the gradient of the sub problem. For $X \in ∂h(p^{(k)})$ the sunproblem top determine $p^{(k+1)}$ reads

\[\operatorname*{argmin}_{p\in\mathcal M} g(p) - \langle X, \log_{p^{(k)}}p\rangle\]

for which usually the cost and gradient functions are computed automatically in the difference of convex algorithm. However, in our case first the closed form solution for the adjoint differential of the logaithmic map is complicated to compute and second the gradint can even be given in a nicer form. We can first simplify in our case with $X = \operatorname{grad} h(p^{(k)})$ that

\[\phi(p) = g(p) - \langle X, \log_{p^{(k)}}p\rangle = a\bigl( p_{1}^2-p_{2}\bigr)^2 + 2\bigl(p_{1}-b\bigr)^2 - 2(p^{(k)}_1-b)p_1 + 2(p^{(k)}_1-b)p^{(k)}_1,\]

its Euclidean gradient reads

\[\operatorname{grad}\phi(p) = \nabla \varphi(p) = \begin{pmatrix} 4a p_1(p_1^2-p_2) + 4(p_1-b) - 2(p^{(k)}_1-b)\\ -2a(p_1^2-p_2) \end{pmatrix}\]

where we can again employ the gradient conversion from before to obtain the Riemannian gradient.

mutable struct SubGrad{P,T,V}
    pk::P
    Xk::T
    a::V
    b::V
end
function (ϕ::SubGrad)(M, p)
    X = zero_vector(M, p)
    ϕ(M, X, p)
    return X
end
function (ϕ::SubGrad)(M, X, p)
    X .= [
        4 * ϕ.a * p[1] * (p[1]^2 - p[2]) + 4 * (p[1] - ϕ.b) - 2 * (ϕ.pk[1] - ϕ.b),
        -2 * ϕ.a * (p[1]^2 - p[2]),
    ]
    riemannian_gradient!(M, X, p, X) # convert
    return X
end

And in orer to update the subsolvers gradient correctly, we have to overwrite

set_manopt_parameter!(ϕ::SubGrad, ::Val{:p}, p) = (ϕ.pk .= p)
set_manopt_parameter!(ϕ::SubGrad, ::Val{:X}, X) = (ϕ.Xk .= X)

And we again introduce for ease of use

docR_g(M, p) = g(M, p; a=a, b=b)
docR_f(M, p) = docR_g(M, p) - h(M, p; a=a, b=b)
docR_grad_h!(M, X, p) = grad_h!(M, X, p; a=a, b=b)
docR_grad_g!(M, X, p) = grad_g!(M, X, p; a=a, b=b)
function docR_grad_f!(M, X, p)
    Y = zero_vector(M, p)
    docR_grad_g!(M, X, p)
    docR_grad_h!(M, Y, p)
    X .-= Y
    return X
end
docR_sub_grad = SubGrad(copy(M, p0), zero_vector(M, p0), a, b)

Then we can finally call the last of our four algorithms to compare, the difference of convex algorithm with the Riemannian metric.

R_doc_state = difference_of_convex_algorithm(
    M_rb, docR_f, docR_g, docR_grad_h!, p0;
    gradient=docR_grad_f!,
    grad_g = docR_grad_g!,
    debug=[debug_vec..., 10^6],
    evaluation=InplaceEvaluation(),
    record=[:Iteration, :Cost],
    stopping_criterion=StopAfterIteration(10^8) | StopWhenChangeLess(1e-16),
    sub_grad=docR_sub_grad,
    sub_hess = nothing, # Use gradient descent
    sub_stopping_criterion=StopAfterIteration(2000) | StopWhenGradientNormLess(1e-16),
    return_state=true,
)
Initial F(x): 7.2208e+03 
At iteration 1235 the algorithm performed a step with a change (0.0) less than 1.0e-16.

# Solver state for `Manopt.jl`s Difference of Convex Algorithm
After 1235 iterations

## Parameters
* sub solver state:
    | # Solver state for `Manopt.jl`s Gradient Descent
    | After 2000 iterations
    | 
    | ## Parameters
    | * retraction method: ExponentialRetraction()
    | 
    | ## Stepsize
    | ArmijoLinesearch() with keyword parameters
    |   * initial_stepsize    = 1.0
    |   * retraction_method   = ExponentialRetraction()
    |   * contraction_factor  = 0.95
    |   * sufficient_decrease = 0.1
    | 
    | ## Stopping criterion
    | 
    | Stop When _one_ of the following are fulfilled:
    |     Max Iteration 2000:   reached
    |     |grad f| < 1.0e-16: not reached
    | Overall: reached
    | This indicates convergence: No

## Stopping criterion

Stop When _one_ of the following are fulfilled:
    Max Iteration 100000000:    not reached
    |Δp| < 1.0e-16: reached
Overall: reached
This indicates convergence: Yes

## Debug
    :Iteration = [(:Iteration, "# %-8d "), (:Cost, "F(x): %1.4e"), " ", (:Change, "|δp|: %1.4e | "), (:GradientNorm, "|grad f|: %1.6e"), "\n", 1000000]
    :Stop = :Stop

## Record
(Iteration = RecordGroup([RecordIteration(), RecordCost()]),)

Comparison in Iterations

fig = plot(;
    legend=:topright,
    xlabel=raw"Iterations $k$ (log. scale)", ylabel=raw"Cost $f(x)$ (log. scale)",
    yaxis=:log,
    ylims=(1e-16, 5*1e5),
    xaxis=:log,
    xlims=(1,10^8),
)
scatter!(fig, [1,], [f(M,p0),], label=raw"$f(p_0)$", markercolor=grey)
egi = get_record(Eucl_GD_state, :Iteration, :Iteration)[1:10000:end] #5308 entries
egc = get_record(Eucl_GD_state, :Iteration, :Cost)[1:10000:end] #5308 entries
plot!(fig, egi, egc, color=teal, label="Euclidean GD")
#
rgi = get_record(R_GD_state, :Iteration, :Iteration)[1:1000:end] # 2444 entries
rgc = get_record(R_GD_state, :Iteration, :Cost)[1:1000:end] # 2444 entries
plot!(fig, rgi, rgc, color=indigo, label="Riemannian GD")
#
edi = get_record(E_doc_state, :Iteration, :Iteration) #26549 entries
edc = get_record(E_doc_state, :Iteration, :Cost) #26549 entries
plot!(fig, edi, edc, color=sand, label="Euclidean DoC")
#
rdi = get_record(R_doc_state, :Iteration, :Iteration) # 1235 entries
rdc = get_record(R_doc_state, :Iteration, :Cost) # 1235 entries
plot!(fig, rdi, rdc, color=green, label="Riemannian DoC")

And we can see that using difference of convex outperforms gradient descent, and using the Riemannian approach required less iterations than their Euclidean counterparts.

Literature

[BFSS23]
R. Bergmann, O. P. Ferreira, E. M. Santos and J. C. Souza. The difference of convex algorithm on Hadamard manifolds. Preprint (2023), arXiv:2112.05250.