The Rayleigh Quotient
Ronny Bergmann 2024-03-09
Introduction
This example reproduces a few conceptual ideas of Optimization on Manifolds that are used throughout [Bou23] using the Rayleigh quotient and explores several different ways to use the algorithms from Manopt.jl
.
For a symmetric matrix $A \in \mathbb R^{n\times n}$ we consider the 📖 Rayleigh Quotient
\[\operatorname*{arg\,min}_{x \in \mathbb R^n \backslash \{0\}} \frac{x^{\mathrm{T}}Ax}{\lVert x \rVert^2}.\]
On the sphere we can omit the denominator and obtain
\[f(p) = p^{\mathrm{T}}Ap,\qquad p ∈ 𝕊^{n-1},\]
which by itself we can again continue in the embedding as
\[\tilde f(x) = x^{\mathrm{T}}Ax,\qquad x \in \mathbb R^n.\]
This cost has the nice feature that at the minimizer $p^*\in\mathbb S^{n-1}$ the function falue $f(p^*)$ is the smalles eigenvalue of $A$.
For the embedded function $\tilde f$ the gradient and Hessian can be computed with classical methods as
\[\begin{align*} ∇\tilde f(x) &= 2Ax, \qquad x ∈ ℝ^n, \\ ∇^2\tilde f(x)[V] &= 2AV, \qquad x, V ∈ ℝ^n. \end{align*}\]
Similarly, cf. Examples 3.62 and 5.27 of [Bou23], the Riemannian gradient and Hessian on the manifold $\mathcal M = \mathbb S^{n-1}$ are given by
\[\begin{align*} \operatorname{grad} f(p) &= 2Ap - 2(p^{\mathrm{T}}Ap)*p,\qquad p ∈ 𝕊^{n-1}, \\ \operatorname{Hess} f(p)[X] &= 2AX - 2(p^{\mathrm{T}}AX)p - 2(p^{\mathrm{T}}Ap)X,\qquad p ∈ 𝕊^{n-1}, X \in T_p𝕊^{n-1} \end{align*}\]
Let’s first generate an example martrx $A$.
using Pkg;
cd(@__DIR__)
Pkg.activate("."); # use the example environment,
using LRUCache, BenchmarkTools, LinearAlgebra, Manifolds, ManoptExamples, Manopt, Random
Random.seed!(42)
n = 500
A = Symmetric(randn(n, n) / n)
And the manifolds
M = Sphere(n-1)
Sphere(499, ℝ)
E = get_embedding(M)
Euclidean(500; field=ℝ)
Setup the corresponding functions
Since RayleighQuotientCost
, RayleighQuotientGrad!!
, and RayleighQuotientHess!!
are themselves manifold agnostic we only need to initialize them once. Agnostic here means that they would compute $f$ is called with M
as their first argument and $\tilde f$ if called with E
.
We instantiate
f = ManoptExamples.RayleighQuotientCost(A)
grad_f = ManoptExamples.RayleighQuotientGrad!!(A)
Hess_f = ManoptExamples.RayleighQuotientHess!!(A)
the suffix !!
also indicates that these functions both work as allocating and in-place variants. Given a starting point and some memory
p0 = [1.0, zeros(n-1)...]
X = zero_vector(M, p0)
we can both call
Y = grad_f(M, p0) # Allocates memory
grad_f(M, X, p0) # Computes in place of X and returns the result in X.
norm(M, p0, X-Y)
0.0
Now we can use a few different variants of solvers to approaach this and this tutorial will walk you through a few of them.
First of all let’s construct the actual result – since Rayleigh quotient minimization is not necessarily the best way to compute the smallest Eigenvalue. We can also compute
λ = min(eigvals(A)...)
-0.08924035897103727
A Solver based on gradient information
Let’s first just use first-order information and since we are just starting, maybe we only derived the Euclidean gradient $\nabla \tilde f$. We can “tell” the solver, that the provided function and the gradient are defined as the Euclidean variants in the embedding. internally, Manopt.jl
then issues the conversion for Euclidean gradients to the corresponding Riemannian one, cf. e.g. this tutorial section or Section 3.8 or more precisely Example 3.62 in [Bou23].
But instead of diving into all the tecnical details, we can just specify objective_type=:Euclidean
to trigger the conversion. We start with a simple gradient descent
s = gradient_descent(M, f, grad_f, p0; objective_type=:Euclidean,
debug = [:Iteration, :Cost, :GradientNorm, 50, "\n"],
return_state=true,
)
q1 = get_solver_result(s)
s
Initial f(x): -0.000727
# 50 f(x): -0.088242|grad f(p)|:0.003870474326981599
# 100 f(x): -0.088680|grad f(p)|:0.0034956568288634616
# 150 f(x): -0.089026|grad f(p)|:0.0026514781676923237
# 200 f(x): -0.089178|grad f(p)|:0.001531160335922979
# Solver state for `Manopt.jl`s Gradient Descent
After 200 iterations
## Parameters
* retraction method: ExponentialRetraction()
## Stepsize
ArmijoLinesearch(;
initial_stepsize=1.0
retraction_method=ExponentialRetraction()
contraction_factor=0.95
sufficient_decrease=0.1
)
## Stopping criterion
Stop When _one_ of the following are fulfilled:
Max Iteration 200: reached
|grad f| < 1.0e-8: not reached
Overall: reached
This indicates convergence: No
## Debug
:Iteration = [(:Iteration, "# %-6d"), (:Cost, "f(x): %f"), (:GradientNorm, "|grad f(p)|:%s"), "\n", 50]
From the final cost we can already see that q1
is an eigenvector to the smallest eigenvalue we obtaines above.
And we can compare this to running with the Riemannian gradient, since the RayleighQuotientGrad!!
returns this one as well, when just called with the sphere as first Argument, we just have to remove the objective_type
.
q2 = gradient_descent(M, f, grad_f, p0;
debug = [:Iteration, :Cost, :GradientNorm, 50, "\n"],
)
#Test that both are the same
isapprox(M, q1,q2)
Initial f(x): -0.000727
# 50 f(x): -0.088242|grad f(p)|:0.0038704743269815734
# 100 f(x): -0.088680|grad f(p)|:0.0034956568288633714
# 150 f(x): -0.089026|grad f(p)|:0.002651478167692353
# 200 f(x): -0.089178|grad f(p)|:0.0015311603359229782
true
We can also benchmark both
@benchmark gradient_descent($M, $f, $grad_f, $p0; objective_type=:Euclidean)
BenchmarkTools.Trial: 15 samples with 1 evaluation per sample.
Range (min … max): 338.265 ms … 362.217 ms ┊ GC (min … max): 9.61% … 9.80%
Time (median): 347.848 ms ┊ GC (median): 9.79%
Time (mean ± σ): 348.597 ms ± 7.903 ms ┊ GC (mean ± σ): 9.58% ± 0.81%
▁ █ █ ▁ ▁ ▁ ▁ ▁ █▁ ▁ ▁
█▁▁▁▁█▁█▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
338 ms Histogram: frequency by time 362 ms <
Memory estimate: 1.13 GiB, allocs estimate: 7964.
@benchmark gradient_descent($M, $f, $grad_f, $p0)
BenchmarkTools.Trial: 160 samples with 1 evaluation per sample.
Range (min … max): 29.575 ms … 44.511 ms ┊ GC (min … max): 0.00% … 2.26%
Time (median): 30.583 ms ┊ GC (median): 1.89%
Time (mean ± σ): 31.379 ms ± 2.418 ms ┊ GC (mean ± σ): 2.40% ± 2.53%
▃█
▇█▃████▇▅▃▄▄▃▂▂▁▁▁▂▁▂▂▁▁▁▁▁▃▂▂▁▃▂▂▁▂▂▁▁▂▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
29.6 ms Histogram: frequency by time 41.5 ms <
Memory estimate: 11.20 MiB, allocs estimate: 8750.
From these results we see, that the conversion from the Euclidean to the Riemannian gradient does require a small amount of effort and hence reduces the performance slighly. Still, if the Euclidean Gradient is easier to compute or already available, this is in terms of coding the faster way. Finally this is a tradeoff between derivation and implementation efforts for the Riemannian gradient and a slight performance reduction when using the Euclidean one.
A Solver based (also) on (approximate) Hessian information
To also involve the Hessian, we consider the trust regions solver with three cases:
- Euclidean, approximating the Hessian
- Euclidean, providing the Hessian
- Riemannian, providing the Hessian but also using in-place evaluations.
q3 = trust_regions(M, f, grad_f, p0; objective_type=:Euclidean,
debug = [:Iteration, :Cost, :GradientNorm, 10, "\n"],
);
Initial f(x): -0.000727
# 10 f(x): -0.088412|grad f(p)|:0.020989167846023046
# 20 f(x): -0.089079|grad f(p)|:0.007420373217153763
# 30 f(x): -0.089095|grad f(p)|:0.0022962557538450884
# 40 f(x): -0.089095|grad f(p)|:0.0022962557538450884
# 50 f(x): -0.089095|grad f(p)|:0.002296255752372694
# 60 f(x): -0.089095|grad f(p)|:0.002296255750900326
# 70 f(x): -0.089095|grad f(p)|:0.002296255749427973
# 80 f(x): -0.089095|grad f(p)|:0.0022962557479555895
# 90 f(x): -0.089095|grad f(p)|:0.002296255746483256
# 100 f(x): -0.089095|grad f(p)|:0.002296255745010824
# 110 f(x): -0.089095|grad f(p)|:0.0022962557435384336
# 120 f(x): -0.089095|grad f(p)|:0.0022962557420660814
# 130 f(x): -0.089095|grad f(p)|:0.0022962557405937175
# 140 f(x): -0.089095|grad f(p)|:0.0022962557391213406
# 150 f(x): -0.089095|grad f(p)|:0.0022962557376489472
# 160 f(x): -0.089095|grad f(p)|:0.0022962557361765603
# 170 f(x): -0.089095|grad f(p)|:0.0022962557347042086
# 180 f(x): -0.089095|grad f(p)|:0.0022962557332318594
# 190 f(x): -0.089095|grad f(p)|:0.002296255731759444
# 200 f(x): -0.089095|grad f(p)|:0.0022962557302870605
# 210 f(x): -0.089095|grad f(p)|:0.002296255728814736
# 220 f(x): -0.089095|grad f(p)|:0.0022962557273423162
# 230 f(x): -0.089095|grad f(p)|:0.002296255725869944
# 240 f(x): -0.089095|grad f(p)|:0.002296255724397584
# 250 f(x): -0.089095|grad f(p)|:0.0022962557229252085
# 260 f(x): -0.089095|grad f(p)|:0.0022962557214528346
# 270 f(x): -0.089095|grad f(p)|:0.0022962557199804855
# 280 f(x): -0.089095|grad f(p)|:0.0022962557185080726
# 290 f(x): -0.089095|grad f(p)|:0.002296255717035685
# 300 f(x): -0.089095|grad f(p)|:0.0022962557155633144
# 310 f(x): -0.089095|grad f(p)|:0.0022962557140909605
# 320 f(x): -0.089095|grad f(p)|:0.002296255712618571
# 330 f(x): -0.089095|grad f(p)|:0.0022962557111461876
# 340 f(x): -0.089095|grad f(p)|:0.002296255709673808
# 350 f(x): -0.089095|grad f(p)|:0.0022962557082014416
# 360 f(x): -0.089095|grad f(p)|:0.00229625570672907
# 370 f(x): -0.089095|grad f(p)|:0.002296255705256684
# 380 f(x): -0.089095|grad f(p)|:0.0022962557037843165
# 390 f(x): -0.089095|grad f(p)|:0.002296255702311948
# 400 f(x): -0.089095|grad f(p)|:0.0022962557008395722
# 410 f(x): -0.089095|grad f(p)|:0.002296255699367198
# 420 f(x): -0.089095|grad f(p)|:0.0022962556978947854
# 430 f(x): -0.089095|grad f(p)|:0.002296255696422434
# 440 f(x): -0.089095|grad f(p)|:0.0022962556949500715
# 450 f(x): -0.089095|grad f(p)|:0.002296255693477701
# 460 f(x): -0.089095|grad f(p)|:0.0022962556920053264
# 470 f(x): -0.089095|grad f(p)|:0.002296255690532953
# 480 f(x): -0.089095|grad f(p)|:0.0022962556890605674
# 490 f(x): -0.089095|grad f(p)|:0.002296255687588196
# 500 f(x): -0.089095|grad f(p)|:0.002296255686115812
# 510 f(x): -0.089095|grad f(p)|:0.0022962556846434557
# 520 f(x): -0.089095|grad f(p)|:0.00229625568317109
# 530 f(x): -0.089095|grad f(p)|:0.0022962556816987227
# 540 f(x): -0.089095|grad f(p)|:0.002296255680226299
# 550 f(x): -0.089095|grad f(p)|:0.0022962556787539364
# 560 f(x): -0.089095|grad f(p)|:0.0022962556772815833
# 570 f(x): -0.089095|grad f(p)|:0.0022962556758091917
# 580 f(x): -0.089095|grad f(p)|:0.0022962556743368473
# 590 f(x): -0.089095|grad f(p)|:0.0022962556728644322
# 600 f(x): -0.089095|grad f(p)|:0.002296255671392082
# 610 f(x): -0.089095|grad f(p)|:0.0022962556699197066
# 620 f(x): -0.089095|grad f(p)|:0.002296255668447314
# 630 f(x): -0.089095|grad f(p)|:0.002296255666974941
# 640 f(x): -0.089095|grad f(p)|:0.002296255665502539
# 650 f(x): -0.089095|grad f(p)|:0.002296255664030171
# 660 f(x): -0.089095|grad f(p)|:0.002296255662557798
# 670 f(x): -0.089095|grad f(p)|:0.0022962556610854248
# 680 f(x): -0.089095|grad f(p)|:0.002296255659613032
# 690 f(x): -0.089095|grad f(p)|:0.0022962556581407082
# 700 f(x): -0.089095|grad f(p)|:0.002296255656668318
# 710 f(x): -0.089095|grad f(p)|:0.0022962556551958985
# 720 f(x): -0.089095|grad f(p)|:0.002296255653723546
# 730 f(x): -0.089095|grad f(p)|:0.002296255652251183
# 740 f(x): -0.089095|grad f(p)|:0.0022962556507788168
# 750 f(x): -0.089095|grad f(p)|:0.0022962556493064516
# 760 f(x): -0.089095|grad f(p)|:0.002296255647834051
# 770 f(x): -0.089095|grad f(p)|:0.002296255646361663
# 780 f(x): -0.089095|grad f(p)|:0.0022962556448893287
# 790 f(x): -0.089095|grad f(p)|:0.0022962556434169014
# 800 f(x): -0.089095|grad f(p)|:0.0022962556419445497
# 810 f(x): -0.089095|grad f(p)|:0.0022962556404721797
# 820 f(x): -0.089095|grad f(p)|:0.0022962556389998293
# 830 f(x): -0.089095|grad f(p)|:0.002296255637527417
# 840 f(x): -0.089095|grad f(p)|:0.0022962556360550538
# 850 f(x): -0.089095|grad f(p)|:0.00229625563458268
# 860 f(x): -0.089095|grad f(p)|:0.002296255633110347
# 870 f(x): -0.089095|grad f(p)|:0.0022962556316379126
# 880 f(x): -0.089095|grad f(p)|:0.0022962556301655314
# 890 f(x): -0.089095|grad f(p)|:0.0022962556286931957
# 900 f(x): -0.089095|grad f(p)|:0.002296255627220805
# 910 f(x): -0.089095|grad f(p)|:0.0022962556257484328
# 920 f(x): -0.089095|grad f(p)|:0.0022962556242760554
# 930 f(x): -0.089095|grad f(p)|:0.0022962556228036954
# 940 f(x): -0.089095|grad f(p)|:0.002296255621331309
# 950 f(x): -0.089095|grad f(p)|:0.0022962556198589204
# 960 f(x): -0.089095|grad f(p)|:0.0022962556183865473
# 970 f(x): -0.089095|grad f(p)|:0.002296255616914149
# 980 f(x): -0.089095|grad f(p)|:0.0022962556154418005
# 990 f(x): -0.089095|grad f(p)|:0.0022962556139694153
# 1000 f(x): -0.089095|grad f(p)|:0.0022962556124970193
To provide the Hessian in the high-level interface we need to prodive it as an anonymous function, since any struct
is considered to (eventually) be the also optional starting point. For space reasons, let’s also shorten the debug print to only iterations 7 and 14.
q4 = trust_regions(M, f, grad_f, (E, p, X) -> Hess_f(E, p, X), p0; objective_type=:Euclidean,
debug = [:Iteration, :Cost, :GradientNorm, 10, "\n"],
);
Initial f(x): -0.000727
# 10 f(x): -0.089234|grad f(p)|:0.0013561755210368023
q5 = trust_regions(M, f, grad_f, (M, Y, p, X) -> Hess_f(M, Y, p, X), p0;
evaluation=InplaceEvaluation(),
debug = [:Iteration, :Cost, :GradientNorm, 10, "\n"],
);
Initial f(x): -0.000727
# 10 f(x): -0.089234|grad f(p)|:0.0013561755210368012
Let’s also here compare them in benchmarks. Let’s here compare all variants in their (more performant) in-place versions.
@benchmark trust_regions($M, $f, $grad_f, $p0;
objective_type=:Euclidean,
evaluation=InplaceEvaluation(),
)
BenchmarkTools.Trial: 8 samples with 1 evaluation per sample.
Range (min … max): 626.868 ms … 723.111 ms ┊ GC (min … max): 7.44% … 6.05%
Time (median): 635.973 ms ┊ GC (median): 8.17%
Time (mean ± σ): 645.791 ms ± 31.647 ms ┊ GC (mean ± σ): 7.80% ± 0.77%
█ ███ ███ █
█▁███▁▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
627 ms Histogram: frequency by time 723 ms <
Memory estimate: 1.96 GiB, allocs estimate: 75417.
@benchmark trust_regions($M, $f, $grad_f, $((E, Y, p, X) -> Hess_f(E, Y, p, X)), $p0;
evaluation=InplaceEvaluation(),
objective_type=:Euclidean
)
BenchmarkTools.Trial: 225 samples with 1 evaluation per sample.
Range (min … max): 18.936 ms … 38.344 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 21.878 ms ┊ GC (median): 9.72%
Time (mean ± σ): 22.276 ms ± 2.463 ms ┊ GC (mean ± σ): 9.19% ± 3.05%
▂ ▄▄▄▇██▇▅▄
█▅▁▅▁███████████▅▁▁▁▅▅▅▁▅▅▁▅▁▁▁▅▁▁▁▁▅▁▁▁▁▅▁▁▁▅▁▁▅▁▁▁▁▁▁▁▁▁▅ ▆
18.9 ms Histogram: log(frequency) by time 36.2 ms <
Memory estimate: 44.92 MiB, allocs estimate: 12651.
@benchmark trust_regions($M, $f, $grad_f, $((M, Y, p, X) -> Hess_f(M, Y, p, X)), $p0;
evaluation=InplaceEvaluation(),
)
BenchmarkTools.Trial: 339 samples with 1 evaluation per sample.
Range (min … max): 12.741 ms … 27.930 ms ┊ GC (min … max): 0.00% … 27.68%
Time (median): 14.477 ms ┊ GC (median): 7.50%
Time (mean ± σ): 14.761 ms ± 1.996 ms ┊ GC (mean ± σ): 7.25% ± 5.24%
▁▂ ▂▄▅▃ ▇█ ▁▃▃
███████▇▆██████▇▄▄▃▄▅▁▁▁▄▁▁▁▂▁▁▁▁▁▁▁▁▁▃▃▃▃▁▁▁▂▁▁▁▁▁▁▂▂▁▁▂▂▂ ▃
12.7 ms Histogram: frequency by time 23.3 ms <
Memory estimate: 16.35 MiB, allocs estimate: 12633.
We see that Hessian approximation is quite costly, and Gradient and Hessian conversion somewhat costly; still, they also might serve as a good starting point, before deciding to delve into computing Riemannian gradients and Hessians.
Of course all 5 runs obtained solutions close by; one might consider the gradient based runs to not have fully converged.
[distance(M, q1, q) for q ∈ [q2,q3] ]
2-element Vector{Float64}:
8.835276992173579e-16
0.8953986994946139
[distance(M, q3, q) for q ∈ [q4,q5] ]
2-element Vector{Float64}:
1.1159334410358788
1.1159334410358788
Which we can also see in the final cost, comparing it to the Eigenvalue
[f(M, q) - λ for q ∈ [q1, q2, q3, q4, q5] ]
5-element Vector{Float64}:
6.211293387550776e-5
6.211293387559103e-5
0.0001451688142172225
-5.551115123125783e-17
-5.551115123125783e-17
Summary
We illustrated several possibilities to call solvers, with both Euclidean gradient and Hessian and Riemannian gradient and Hessian, allocating and in-place function. While the performance is better for the Riemannian case, the Euclidean one is a worthy alternative, when those are easier to compute.
Literature
- [Bou23]
- N. Boumal. An Introduction to Optimization on Smooth Manifolds. First Edition (Cambridge University Press, 2023).