Stiefel
Common and metric independent functions
Manifolds.Stiefel
— TypeStiefel{T,𝔽} <: AbstractDecoratorManifold{𝔽}
The Stiefel manifold consists of all $n×k$, $n ≥ k$ unitary matrices, i.e.
\[\operatorname{St}(n,k) = \bigl\{ p ∈ 𝔽^{n×k}\ \big|\ p^{\mathrm{H}}p = I_k \bigr\},\]
where $𝔽 ∈ \{ℝ, ℂ\}$, $⋅^{\mathrm{H}}$ denotes the complex conjugate transpose or Hermitian, and $I_k ∈ ℝ^{k×k}$ denotes the $k×k$ identity matrix.
The tangent space at a point $p ∈ \mathcal M$ is given by
\[T_p \mathcal M = \{ X ∈ 𝔽^{n×k} : p^{\mathrm{H}}X + \overline{X^{\mathrm{H}}p} = 0_k\},\]
where $0_k$ is the $k×k$ zero matrix and $\overline{⋅}$ the (elementwise) complex conjugate.
This manifold is modeled as an embedded manifold to the Euclidean
, i.e. several functions like the inner
product and the zero_vector
are inherited from the embedding.
The manifold is named after Eduard L. Stiefel (1909–1978).
Constructor
Stiefel(n, k, field=ℝ; parameter::Symbol=:type)
Generate the (real-valued) Stiefel manifold of $n×k$ dimensional orthonormal matrices.
Base.rand
— Methodrand(::Stiefel; vector_at=nothing, σ::Real=1.0)
When vector_at
is nothing
, return a random (Gaussian) point x
on the Stiefel
manifold M
by generating a (Gaussian) matrix with standard deviation σ
and return the orthogonalized version, i.e. return the Q component of the QR decomposition of the random matrix of size $n×k$.
When vector_at
is not nothing
, return a (Gaussian) random vector from the tangent space $T_{vector\_at}\mathrm{St}(n,k)$ with mean zero and standard deviation σ
by projecting a random Matrix onto the tangent vector at vector_at
.
ManifoldsBase.change_metric
— Methodchange_metric(M::Stiefel, ::EuclideanMetric, p X)
Change X
to the corresponding vector with respect to the metric of the Stiefel
M
, which is just the identity, since the manifold is isometrically embedded.
ManifoldsBase.change_representer
— Methodchange_representer(M::Stiefel, ::EuclideanMetric, p, X)
Change X
to the corresponding representer of a cotangent vector at p
. Since the Stiefel
manifold M
, is isometrically embedded, this is the identity
ManifoldsBase.check_point
— Methodcheck_point(M::Stiefel, p; kwargs...)
Check whether p
is a valid point on the Stiefel
M
=$\operatorname{St}(n,k)$, i.e. that it has the right AbstractNumbers
type and $p^{\mathrm{H}}p$ is (approximately) the identity, where $⋅^{\mathrm{H}}$ is the complex conjugate transpose. The settings for approximately can be set with kwargs...
.
ManifoldsBase.check_vector
— Methodcheck_vector(M::Stiefel, p, X; kwargs...)
Checks whether X
is a valid tangent vector at p
on the Stiefel
M
=$\operatorname{St}(n,k)$, i.e. the AbstractNumbers
fits and it (approximately) holds that $p^{\mathrm{H}}X + \overline{X^{\mathrm{H}}p} = 0$, where $⋅^{\mathrm{H}}$ denotes the Hermitian and $\overline{⋅}$ the (elementwise) complex conjugate. The settings for approximately can be set with kwargs...
.
ManifoldsBase.default_inverse_retraction_method
— Methoddefault_inverse_retraction_method(M::Stiefel)
Return PolarInverseRetraction
as the default inverse retraction for the Stiefel
manifold.
ManifoldsBase.default_retraction_method
— Methoddefault_retraction_method(M::Stiefel)
Return PolarRetraction
as the default retraction for the Stiefel
manifold.
ManifoldsBase.default_vector_transport_method
— Methoddefault_vector_transport_method(M::Stiefel)
Return the DifferentiatedRetractionVectorTransport
of the [PolarRetraction
](PolarRetraction
as the default vector transport method for the Stiefel
manifold.
ManifoldsBase.inverse_retract
— Methodinverse_retract(M::Stiefel, p, q, ::PolarInverseRetraction)
Compute the inverse retraction based on a singular value decomposition for two points p
, q
on the Stiefel
manifold M
. This follows the following approach: From the Polar retraction we know that
\[\operatorname{retr}_p^{-1}q = qs - t\]
if such a symmetric positive definite $k×k$ matrix exists. Since $qs - t$ is also a tangent vector at $p$ we obtain
\[p^{\mathrm{H}}qs + s(p^{\mathrm{H}}q)^{\mathrm{H}} + 2I_k = 0,\]
which can either be solved by a Lyapunov approach or a continuous-time algebraic Riccati equation.
This implementation follows the Lyapunov approach.
ManifoldsBase.inverse_retract
— Methodinverse_retract(M::Stiefel, p, q, ::QRInverseRetraction)
Compute the inverse retraction based on a qr decomposition for two points p
, q
on the Stiefel
manifold M
and return the resulting tangent vector in X
. The computation follows Algorithm 1 in [KFT13].
ManifoldsBase.is_flat
— Methodis_flat(M::Stiefel)
Return true if Stiefel
M
is one-dimensional.
ManifoldsBase.manifold_dimension
— Methodmanifold_dimension(M::Stiefel)
Return the dimension of the Stiefel
manifold M
=$\operatorname{St}(n,k,𝔽)$. The dimension is given by
\[\begin{aligned} \dim \mathrm{St}(n, k, ℝ) &= nk - \frac{1}{2}k(k+1)\\ \dim \mathrm{St}(n, k, ℂ) &= 2nk - k^2\\ \dim \mathrm{St}(n, k, ℍ) &= 4nk - k(2k-1) \end{aligned}\]
ManifoldsBase.representation_size
— Methodrepresentation_size(M::Stiefel)
Returns the representation size of the Stiefel
M
=$\operatorname{St}(n,k)$, i.e. (n,k)
, which is the matrix dimensions.
ManifoldsBase.retract
— Methodretract(::Stiefel, p, X, ::CayleyRetraction)
Compute the retraction on the Stiefel
that is based on the Cayley transform[Zhu16]. Using
\[ W_{p,X} = \operatorname{P}_pXp^{\mathrm{H}} - pX^{\mathrm{H}}\operatorname{P_p} \quad\text{where} \operatorname{P}_p = I - \frac{1}{2}pp^{\mathrm{H}}\]
the formula reads
\[ \operatorname{retr}_pX = \Bigl(I - \frac{1}{2}W_{p,X}\Bigr)^{-1}\Bigl(I + \frac{1}{2}W_{p,X}\Bigr)p.\]
It is implemented as the case $m=1$ of the PadeRetraction
.
ManifoldsBase.retract
— Methodretract(M::Stiefel, p, X, ::PadeRetraction{m})
Compute the retraction on the Stiefel
manifold M
based on the Padé approximation of order $m$ [ZD18]. Let $p_m$ and $q_m$ be defined for any matrix $A ∈ ℝ^{n×x}$ as
\[ p_m(A) = \sum_{k=0}^m \frac{(2m-k)!m!}{(2m)!(m-k)!}\frac{A^k}{k!}\]
and
\[ q_m(A) = \sum_{k=0}^m \frac{(2m-k)!m!}{(2m)!(m-k)!}\frac{(-A)^k}{k!}\]
respectively. Then the Padé approximation (of the matrix exponential $\exp(A)$) reads
\[ r_m(A) = q_m(A)^{-1}p_m(A)\]
Defining further
\[ W_{p,X} = \operatorname{P}_pXp^{\mathrm{H}} - pX^{\mathrm{H}}\operatorname{P_p} \quad\text{where } \operatorname{P}_p = I - \frac{1}{2}pp^{\mathrm{H}}\]
the retraction reads
\[ \operatorname{retr}_pX = r_m(W_{p,X})p\]
ManifoldsBase.retract
— Methodretract(M::Stiefel, p, X, ::PolarRetraction)
Compute the SVD-based retraction PolarRetraction
on the Stiefel
manifold M
. With $USV = p + X$ the retraction reads
\[\operatorname{retr}_p X = U\bar{V}^\mathrm{H}.\]
ManifoldsBase.retract
— Methodretract(M::Stiefel, p, X, ::QRRetraction)
Compute the QR-based retraction QRRetraction
on the Stiefel
manifold M
. With $QR = p + X$ the retraction reads
\[\operatorname{retr}_p X = QD,\]
where $D$ is a $n×k$ matrix with
\[D = \operatorname{diag}\bigl(\operatorname{sgn}(R_{ii}+0,5)_{i=1}^k \bigr),\]
where ``\operatorname{sgn}(p) = \begin{cases} 1 & \text{ for } p > 0,\ 0 & \text{ for } p = 0,\ -1& \text{ for } p < 0. \end{cases}``
ManifoldsBase.vector_transport_direction
— Methodvector_transport_direction(::Stiefel, p, X, d, ::DifferentiatedRetractionVectorTransport{CayleyRetraction})
Compute the vector transport given by the differentiated retraction of the CayleyRetraction
, cf. [Zhu16] Equation (17).
The formula reads
\[\operatorname{T}_{p,d}(X) = \Bigl(I - \frac{1}{2}W_{p,d}\Bigr)^{-1}W_{p,X}\Bigl(I - \frac{1}{2}W_{p,d}\Bigr)^{-1}p,\]
with
\[ W_{p,X} = \operatorname{P}_pXp^{\mathrm{H}} - pX^{\mathrm{H}}\operatorname{P_p} \quad\text{where } \operatorname{P}_p = I - \frac{1}{2}pp^{\mathrm{H}}\]
Since this is the differentiated retraction as a vector transport, the result will be in the tangent space at $q=\operatorname{retr}_p(d)$ using the CayleyRetraction
.
ManifoldsBase.vector_transport_direction
— Methodvector_transport_direction(M::Stiefel, p, X, d, DifferentiatedRetractionVectorTransport{PolarRetraction})
Compute the vector transport by computing the push forward of retract(::Stiefel, ::Any, ::Any, ::PolarRetraction)
Section 3.5 of [Zhu16]:
\[T_{p,d}^{\text{Pol}}(X) = q*Λ + (I-qq^{\mathrm{T}})X(1+d^\mathrm{T}d)^{-\frac{1}{2}},\]
where $q = \operatorname{retr}^{\mathrm{Pol}}_p(d)$, and $Λ$ is the unique solution of the Sylvester equation
\[ Λ(I+d^\mathrm{T}d)^{\frac{1}{2}} + (I + d^\mathrm{T}d)^{\frac{1}{2}} = q^\mathrm{T}X - X^\mathrm{T}q\]
ManifoldsBase.vector_transport_direction
— Methodvector_transport_direction(M::Stiefel, p, X, d, DifferentiatedRetractionVectorTransport{QRRetraction})
Compute the vector transport by computing the push forward of the retract(::Stiefel, ::Any, ::Any, ::QRRetraction)
, See [AMS08], p. 173, or Section 3.5 of [Zhu16].
\[T_{p,d}^{\text{QR}}(X) = q*\rho_{\mathrm{s}}(q^\mathrm{T}XR^{-1}) + (I-qq^{\mathrm{T}})XR^{-1},\]
where $q = \operatorname{retr}^{\mathrm{QR}}_p(d)$, $R$ is the $R$ factor of the QR decomposition of $p + d$, and
\[\bigl( \rho_{\mathrm{s}}(A) \bigr)_{ij} = \begin{cases} A_{ij}&\text{ if } i > j\\ 0 \text{ if } i = j\\ -A_{ji} \text{ if } i < j.\\ \end{cases}\]
ManifoldsBase.vector_transport_to
— Methodvector_transport_to(M::Stiefel, p, X, q, DifferentiatedRetractionVectorTransport{PolarRetraction})
Compute the vector transport by computing the push forward of the retract(M::Stiefel, ::Any, ::Any, ::PolarRetraction)
, see Section 4 of [HGA15] or Section 3.5 of [Zhu16]:
\[T_{q\gets p}^{\text{Pol}}(X) = q*Λ + (I-qq^{\mathrm{T}})X(1+d^\mathrm{T}d)^{-\frac{1}{2}},\]
where $d = \bigl( \operatorname{retr}^{\mathrm{Pol}}_p\bigr)^{-1}(q)$, and $Λ$ is the unique solution of the Sylvester equation
\[ Λ(I+d^\mathrm{T}d)^{\frac{1}{2}} + (I + d^\mathrm{T}d)^{\frac{1}{2}} = q^\mathrm{T}X - X^\mathrm{T}q\]
ManifoldsBase.vector_transport_to
— Methodvector_transport_to(M::Stiefel, p, X, q, DifferentiatedRetractionVectorTransport{QRRetraction})
Compute the vector transport by computing the push forward of the retract(M::Stiefel, ::Any, ::Any, ::QRRetraction)
, see [AMS08], p. 173, or Section 3.5 of [Zhu16].
\[T_{q \gets p}^{\text{QR}}(X) = q*\rho_{\mathrm{s}}(q^\mathrm{T}XR^{-1}) + (I-qq^{\mathrm{T}})XR^{-1},\]
where $d = \bigl(\operatorname{retr}^{\mathrm{QR}}\bigr)^{-1}_p(q)$, $R$ is the $R$ factor of the QR decomposition of $p+X$, and
\[\bigl( \rho_{\mathrm{s}}(A) \bigr)_{ij} = \begin{cases} A_{ij}&\text{ if } i > j\\ 0 \text{ if } i = j\\ -A_{ji} \text{ if } i < j.\\ \end{cases}\]
ManifoldsBase.vector_transport_to
— Methodvector_transport_to(M::Stiefel, p, X, q, ::ProjectionTransport)
Compute a vector transport by projection, i.e. project X
from the tangent space at p
by projection it onto the tangent space at q
.
Default metric: the Euclidean metric
The EuclideanMetric
is obtained from the embedding of the Stiefel manifold in $ℝ^{n,k}$.
Base.exp
— Methodexp(M::Stiefel, p, X)
Compute the exponential map on the Stiefel
{n,k,𝔽}
() manifold M
emanating from p
in tangent direction X
.
\[\exp_p X = \begin{pmatrix} p\\X \end{pmatrix} \operatorname{Exp} \left( \begin{pmatrix} p^{\mathrm{H}}X & - X^{\mathrm{H}}X\\ I_n & p^{\mathrm{H}}X\end{pmatrix} \right) \begin{pmatrix} \exp( -p^{\mathrm{H}}X) \\ 0_n\end{pmatrix},\]
where $\operatorname{Exp}$ denotes matrix exponential, $⋅^{\mathrm{H}}$ denotes the complex conjugate transpose or Hermitian, and $I_k$ and $0_k$ are the identity matrix and the zero matrix of dimension $k×k$, respectively.
ManifoldDiff.riemannian_Hessian
— MethodY = riemannian_Hessian(M::Stiefel, p, G, H, X)
riemannian_Hessian!(M::Stiefel, Y, p, G, H, X)
Compute the Riemannian Hessian $\operatorname{Hess} f(p)[X]$ given the Euclidean gradient $∇ f(\tilde p)$ in G
and the Euclidean Hessian $∇^2 f(\tilde p)[\tilde X]$ in H
, where $\tilde p, \tilde X$ are the representations of $p,X$ in the embedding,.
Here, we adopt Eq. (5.6) [Ngu23], where we use for the EuclideanMetric
$α_0=α_1=1$ in their formula. Then the formula reads
\[ \operatorname{Hess}f(p)[X] = \operatorname{proj}_{T_p\mathcal M}\Bigl( ∇^2f(p)[X] - \frac{1}{2} X \bigl((∇f(p))^{\mathrm{H}}p + p^{\mathrm{H}}∇f(p)\bigr) \Bigr).\]
Compared to Eq. (5.6) also the metric conversion simplifies to the identity.
ManifoldsBase.Weingarten
— MethodWeingarten(M::Stiefel, p, X, V)
Compute the Weingarten map $\mathcal W_p$ at p
on the Stiefel
M
with respect to the tangent vector $X \in T_p\mathcal M$ and the normal vector $V \in N_p\mathcal M$.
The formula is due to [AMT13] given by
\[\mathcal W_p(X,V) = -Xp^{\mathrm{H}}V - \frac{1}{2}p\bigl(X^\mathrm{H}V + V^{\mathrm{H}}X\bigr)\]
ManifoldsBase.get_basis
— Methodget_basis(M::Stiefel{<:Any,ℝ}, p, B::DefaultOrthonormalBasis)
Create the default basis using the parametrization for any $X ∈ T_p\mathcal M$. Set $p_\bot \in ℝ^{n×(n-k)}$ the matrix such that the $n×n$ matrix of the common columns $[p\ p_\bot]$ is an ONB. For any skew symmetric matrix $a ∈ ℝ^{k×k}$ and any $b ∈ ℝ^{(n-k)×k}$ the matrix
\[X = pa + p_\bot b ∈ T_p\mathcal M\]
and we can use the $\frac{1}{2}k(k-1) + (n-k)k = nk-\frac{1}{2}k(k+1)$ entries of $a$ and $b$ to specify a basis for the tangent space. using unit vectors for constructing both the upper matrix of $a$ to build a skew symmetric matrix and the matrix b, the default basis is constructed.
Since $[p\ p_⊥]$ is an automorphism on $ℝ^{n×p}$ the elements of $a$ and $b$ are orthonormal coordinates for the tangent space. To be precise exactly one element in the upper triangular entries of $a$ is set to $1$ its symmetric entry to $-1$ and we normalize with the factor $\frac{1}{\sqrt{2}}$ and for $b$ one can just use unit vectors reshaped to a matrix to obtain orthonormal set of parameters.
ManifoldsBase.injectivity_radius
— Methodinjectivity_radius(M::Stiefel)
Return the injectivity radius for the Stiefel
manifold M
, which is globally $π$ [ZS24].
ManifoldsBase.inverse_retract
— Methodinverse_retract(M::Stiefel, p, q, method::ProjectionInverseRetraction)
Compute a projection-based inverse retraction.
The inverse retraction is computed by projecting the logarithm map in the embedding to the tangent space at $p$.
ManifoldsBase.project
— Methodproject(M::Stiefel,p)
Projects p
from the embedding onto the Stiefel
M
, i.e. compute q
as the polar decomposition of $p$ such that $q^{\mathrm{H}}q$ is the identity, where $⋅^{\mathrm{H}}$ denotes the hermitian, i.e. complex conjugate transposed.
ManifoldsBase.project
— Methodproject(M::Stiefel, p, X)
Project X
onto the tangent space of p
to the Stiefel
manifold M
. The formula reads
\[\operatorname{proj}_{T_p\mathcal M}(X) = X - p \operatorname{Sym}(p^{\mathrm{H}}X),\]
where $\operatorname{Sym}(q)$ is the symmetrization of $q$, e.g. by $\operatorname{Sym}(q) = \frac{q^{\mathrm{H}}+q}{2}$.
ManifoldsBase.retract
— Methodretract(M::Stiefel, p, X, method::ProjectionRetraction)
Compute a projection-based retraction.
The retraction is computed by projecting the exponential map in the embedding to M
.
The canonical metric
Any $X∈T_p\mathcal M$, $p∈\mathcal M$, can be written as
\[X = pA + (I_n-pp^{\mathrm{T}})B, \quad A ∈ ℝ^{p×p} \text{ skew-symmetric}, \quad B ∈ ℝ^{n×p} \text{ arbitrary.}\]
In the EuclideanMetric
, the elements from $A$ are counted twice (i.e. weighted with a factor of 2). The canonical metric avoids this.
Manifolds.ApproximateLogarithmicMap
— TypeApproximateLogarithmicMap <: ApproximateInverseRetraction
An approximate implementation of the logarithmic map, which is an inverse_retract
ion. See inverse_retract(::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},CanonicalMetric}, ::Any, ::Any, ::ApproximateLogarithmicMap)
for a use case.
Fields
max_iterations
– maximal number of iterations used in the approximationtolerance
– a tolerance used as a stopping criterion
Manifolds.CanonicalMetric
— TypeCanonicalMetric <: AbstractMetric
The Canonical Metric refers to a metric for the Stiefel
manifold, see[EAS98].
Base.exp
— Methodq = exp(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},CanonicalMetric}, p, X)
exp!(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},CanonicalMetric}, q, p, X)
Compute the exponential map on the Stiefel
(n, k)
manifold with respect to the CanonicalMetric
.
First, decompose The tangent vector $X$ into its horizontal and vertical component with respect to $p$, i.e.
\[X = pp^{\mathrm{T}}X + (I_n-pp^{\mathrm{T}})X,\]
where $I_n$ is the $n×n$ identity matrix. We introduce $A=p^{\mathrm{T}}X$ and $QR = (I_n-pp^{\mathrm{T}})X$ the qr
decomposition of the vertical component. Then using the matrix exponential $\operatorname{Exp}$ we introduce $B$ and $C$ as
\[\begin{pmatrix} B\\C \end{pmatrix} \coloneqq \operatorname{Exp}\left( \begin{pmatrix} A & -R^{\mathrm{T}}\\ R & 0 \end{pmatrix} \right) \begin{pmatrix}I_k\\0\end{pmatrix}\]
the exponential map reads
\[q = \exp_p X = pC + QB.\]
ManifoldDiff.riemannian_Hessian
— MethodY = riemannian_Hessian(M::MetricManifold{ℝ, Stiefel, CanonicalMetric}, p, G, H, X)
riemannian_Hessian!(M::MetricManifold{ℝ, Stiefel, CanonicalMetric}, Y, p, G, H, X)
Compute the Riemannian Hessian $\operatorname{Hess} f(p)[X]$ given the Euclidean gradient $∇ f(\tilde p)$ in G
and the Euclidean Hessian $∇^2 f(\tilde p)[\tilde X]$ in H
, where $\tilde p, \tilde X$ are the representations of $p,X$ in the embedding,.
Here, we adopt Eq. (5.6) [Ngu23], for the CanonicalMetric
$α_0=1, α_1=\frac{1}{2}$ in their formula. The formula reads
\[ \operatorname{Hess}f(p)[X] = \operatorname{proj}_{T_p\mathcal M}\Bigl( ∇^2f(p)[X] - \frac{1}{2} X \bigl( (∇f(p))^{\mathrm{H}}p + p^{\mathrm{H}}∇f(p)\bigr) - \frac{1}{2} \bigl( P ∇f(p) p^{\mathrm{H}} + p ∇f(p))^{\mathrm{H}} P)X \Bigr),\]
where $P = I-pp^{\mathrm{H}}$.
ManifoldsBase.inner
— Methodinner(M::MetricManifold{ℝ, Stiefel{<:Any,ℝ}, X, CanonicalMetric}, p, X, Y)
Compute the inner product on the Stiefel
manifold with respect to the CanonicalMetric
. The formula reads
\[g_p(X,Y) = \operatorname{tr}\bigl( X^{\mathrm{T}}(I_n - \frac{1}{2}pp^{\mathrm{T}})Y \bigr).\]
ManifoldsBase.inverse_retract
— MethodX = inverse_retract(M::MetricManifold{ℝ, Stiefel{<:Any,ℝ}, CanonicalMetric}, p, q, a::ApproximateLogarithmicMap)
inverse_retract!(M::MetricManifold{ℝ, Stiefel{<:Any,ℝ}, X, CanonicalMetric}, p, q, a::ApproximateLogarithmicMap)
Compute an approximation to the logarithmic map on the Stiefel
(n, k)
manifold with respect to the CanonicalMetric
using a matrix-algebraic based approach to an iterative inversion of the formula of the exp
.
The algorithm is derived in [Zim17] and it uses the max_iterations
and the tolerance
field from the ApproximateLogarithmicMap
.
The submersion or normal metric
Manifolds.StiefelSubmersionMetric
— TypeStiefelSubmersionMetric{T<:Real} <: RiemannianMetric
The submersion (or normal) metric family on the Stiefel
manifold.
The family, with a single real parameter $α>-1$, has two special cases:
- $α = -\frac{1}{2}$:
EuclideanMetric
- $α = 0$:
CanonicalMetric
The family was described in [HML21]. This implementation follows the description in [ZH22].
Constructor
StiefelSubmersionMetric(α)
Construct the submersion metric on the Stiefel manifold with the parameter $α$.
Base.exp
— Methodq = exp(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric}, p, X)
exp!(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric}, q, p, X)
Compute the exponential map on the Stiefel(n,k)
manifold with respect to the StiefelSubmersionMetric
.
The exponential map is given by
\[\exp_p X = \operatorname{Exp}\bigl( -\frac{2α+1}{α+1} p p^\mathrm{T} X p^\mathrm{T} + X p^\mathrm{T} - p X^\mathrm{T} \bigr) p \operatorname{Exp}\bigl(\frac{\alpha}{\alpha+1} p^\mathrm{T} X\bigr)\]
This implementation is based on [ZH22].
For $k < \frac{n}{2}$ the exponential is computed more efficiently using StiefelFactorization
.
Base.log
— Methodlog(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric}, p, q; kwargs...)
Compute the logarithmic map on the Stiefel(n,k)
manifold with respect to the StiefelSubmersionMetric
.
The logarithmic map is computed using ShootingInverseRetraction
. For $k ≤ \lfloor\frac{n}{2}\rfloor$, this is sped up using the $k$-shooting method of [ZH22]. Keyword arguments are forwarded to ShootingInverseRetraction
; see that documentation for details. Their defaults are:
num_transport_points=4
tolerance=sqrt(eps())
max_iterations=1_000
ManifoldDiff.riemannian_Hessian
— MethodY = riemannian_Hessian(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},StiefelSubmersionMetric}, p, G, H, X)
riemannian_Hessian!(MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},StiefelSubmersionMetric}, Y, p, G, H, X)
Compute the Riemannian Hessian $\operatorname{Hess} f(p)[X]$ given the Euclidean gradient $∇ f(\tilde p)$ in G
and the Euclidean Hessian $∇^2 f(\tilde p)[\tilde X]$ in H
, where $\tilde p, \tilde X$ are the representations of $p,X$ in the embedding,.
Here, we adopt Eq. (5.6) [Ngu23], for the CanonicalMetric
$α_0=1, α_1=\frac{1}{2}$ in their formula. The formula reads
\[ \operatorname{Hess}f(p)[X] = \operatorname{proj}_{T_p\mathcal M}\Bigl( ∇^2f(p)[X] - \frac{1}{2} X \bigl( (∇f(p))^{\mathrm{H}}p + p^{\mathrm{H}}∇f(p)\bigr) - \frac{2α+1}{2(α+1)} \bigl( P ∇f(p) p^{\mathrm{H}} + p ∇f(p))^{\mathrm{H}} P)X \Bigr),\]
where $P = I-pp^{\mathrm{H}}$.
Compared to Eq. (5.6) we have that their $α_0 = 1$and $\alpha_1 = \frac{2α+1}{2(α+1)} + 1$.
ManifoldsBase.inner
— Methodinner(M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric}, p, X, Y)
Compute the inner product on the Stiefel
manifold with respect to the StiefelSubmersionMetric
. The formula reads
\[g_p(X,Y) = \operatorname{tr}\bigl( X^{\mathrm{T}}(I_n - \frac{2α+1}{2(α+1)}pp^{\mathrm{T}})Y \bigr),\]
where $α$ is the parameter of the metric.
ManifoldsBase.inverse_retract
— Methodinverse_retract(
M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric},
p,
q,
method::ShootingInverseRetraction,
)
Compute the inverse retraction using ShootingInverseRetraction
.
In general the retraction is computed using the generic shooting method.
inverse_retract(
M::MetricManifold{ℝ,<:Stiefel{<:Any,ℝ},<:StiefelSubmersionMetric},
p,
q,
method::ShootingInverseRetraction{
ExponentialRetraction,
ProjectionInverseRetraction,
<:Union{ProjectionTransport,ScaledVectorTransport{ProjectionTransport}},
},
)
Compute the inverse retraction using ShootingInverseRetraction
more efficiently.
For $k < \frac{n}{2}$ the retraction is computed more efficiently using StiefelFactorization
.
Internal types and functions
Manifolds.StiefelFactorization
— TypeStiefelFactorization{UT,XT} <: AbstractManifoldPoint
Represent points (and vectors) on Stiefel(n, k)
with $2k×k$ factors [ZH22].
Given a point $p ∈ \mathrm{St}(n, k)$ and another matrix $B ∈ ℝ^{n×k}$ for $k ≤ \lfloor\frac{n}{2}\rfloor$ the factorization is
\[\begin{aligned} B &= UZ\\ U &= \begin{bmatrix}p & Q\end{bmatrix} ∈ \mathrm{St}(n, 2k)\\ Z &= \begin{bmatrix}Z_1 \\ Z_2\end{bmatrix}, \quad Z_1,Z_2 ∈ ℝ^{k×k}. \end{aligned}\]
If $B ∈ \mathrm{St}(n, k)$, then $Z ∈ \mathrm{St}(2k, k)$. Note that not every matrix $B$ can be factorized in this way.
For a fixed $U$, if $r ∈ \mathrm{St}(n, k)$ has the factor $Z_r ∈ \mathrm{St}(2k, k)$, then $X_r ∈ T_r \mathrm{St}(n, k)$ has the factor $Z_{X_r} ∈ T_{Z_r} \mathrm{St}(2k, k)$.
$Q$ is determined by choice of a second matrix $A ∈ ℝ^{n×k}$ with the decomposition
\[\begin{aligned} A &= UZ\\ Z_1 &= p^\mathrm{T} A \\ Q Z_2 &= (I - p p^\mathrm{T}) A, \end{aligned}\]
where here $Q Z_2$ is the any decomposition that produces $Q ∈ \mathrm{St}(n, k)$, for which we choose the QR decomposition.
This factorization is useful because it is closed under addition, subtraction, scaling, projection, and the Riemannian exponential and logarithm under the StiefelSubmersionMetric
. That is, if all matrices involved are factorized to have the same $U$, then all of these operations and any algorithm that depends only on them can be performed in terms of the $2k×k$ matrices $Z$. For $n ≫ k$, this can be much more efficient than working with the full matrices.
This type is intended strictly for internal use and should not be directly used.
Manifolds.stiefel_factorization
— Methodstiefel_factorization(p, x) -> StiefelFactorization
Compute the StiefelFactorization
of $x$ relative to the point $p$.
Literature
- [AMT13]
- P. -.-A. Absil, R. Mahony and J. Trumpf. An Extrinsic Look at the Riemannian Hessian. In: Geometric Science of Information, edited by F. Nielsen and F. Barbaresco (Springer Berlin Heidelberg, 2013); pp. 361–368.
- [AMS08]
- P.-A. Absil, R. Mahony and R. Sepulchre. Optimization Algorithms on Matrix Manifolds (Princeton University Press, 2008), available online at press.princeton.edu/chapters/absil/.
- [EAS98]
- A. Edelman, T. A. Arias and S. T. Smith. The Geometry of Algorithms with Orthogonality Constraints. SIAM Journal on Matrix Analysis and Applications 20, 303–353 (1998), arXiv:806030.
- [HGA15]
- W. Huang, K. A. Gallivan and P.-A. Absil. A Broyden Class of Quasi-Newton Methods for Riemannian Optimization. SIAM Journal on Optimization 25, 1660–1685 (2015).
- [HML21]
- K. Hüper, I. Markina and F. S. Leite. A Lagrangian approach to extremal curves on Stiefel manifolds. Journal of Geometric Mechanics 13, 55 (2021).
- [KFT13]
- T. Kaneko, S. Fiori and T. Tanaka. Empirical Arithmetic Averaging Over the Compact Stiefel Manifold. IEEE Transactions on Signal Processing 61, 883–894 (2013).
- [Ngu23]
- D. Nguyen. Operator-Valued Formulas for Riemannian Gradient and Hessian and Families of Tractable Metrics in Riemannian Optimization. Journal of Optimization Theory and Applications 198, 135–164 (2023), arXiv:2009.10159.
- [Zhu16]
- X. Zhu. A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Computational Optimization and Applications 67, 73–110 (2016).
- [ZD18]
- X. Zhu and C. Duan. On matrix exponentials and their approximations related to optimization on the Stiefel manifold. Optimization Letters 13, 1069–1083 (2018).
- [Zim17]
- R. Zimmermann. A Matrix-Algebraic Algorithm for the Riemannian Logarithm on the Stiefel Manifold under the Canonical Metric. SIAM J. Matrix Anal. Appl. 38, 322–342 (2017), arXiv:1604.05054.
- [ZH22]
- R. Zimmermann and K. Hüper. Computing the Riemannian Logarithm on the Stiefel Manifold: Metrics, Methods, and Performance. SIAM Journal on Matrix Analysis and Applications 43, 953–980 (2022), arXiv:2103.12046.
- [ZS24]
- R. Zimmermann and J. Stoye. The injectivity radius of the compact Stiefel manifold under the Euclidean metric, arXiv Preprint (2024), arXiv:2405.02268.