Sparse PCA
Paula John, Hajg Jasa 2025-10-01
Introduction
In this example we use the Nonconvex Riemannian Proximal Gradient (NCRPG) method [BJJP25b] and compare it to the Riemannian Proximal Gradient (RPG) method [HW21]. This example reproduces the results from [BJJP25b], Section 6.1. The numbers may vary slightly due to having run this notebook on a different machine.
using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots, LaTeXStrings
using Random, LinearAlgebra, LRUCache, Distributions
using ManifoldDiff, Manifolds, Manopt, ManoptExamples
The Problem
Let $\mathcal M = \mathrm{OB}(n,r)$ be the oblique manifold, i.e., the set of $n \times r$ matrices with unit-norm columns. Let $g \colon \mathcal M \to \mathbb R$ be defined by
\[g(X) = \frac{1}{2} \Vert X^\top A^\top A X - D^2 \Vert^2,\]
where $A \in \mathbb R^{m \times n}$ is a data matrix, $D = \mathrm{diag}(d_1, \ldots, d_r)$ is a diagonal matrix containing the top $r$ singular values of $A$, and $\Vert \cdot \Vert$ is the Frobenius norm.
Let $h \colon \mathcal M \to \mathbb R$ be defined by
\[h(X) = \mu \Vert X \Vert_1\]
be the sparsity-enforcing term given by the $\ell_1$-norm, where $\mu \ge 0$ is a regularization parameter.
We define our total objective function as $f = g + h$. The goal is to find the minimizer of $f$ on $\mathcal M$, which is heuristically the point that diagonalizes $A^\top A$ as much as possible while being sparse.
Numerical Experiment
We initialize the experiment parameters, as well as some utility functions.
# Set random seed for reproducibility
random_seed = 1520
Random.seed!(random_seed)
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.0
m_tests = 10 # number of tests for each parameter setting
means = 20 # number of means to compute
atol = 1e-7
max_iters = 100000
n_p_array = [(100,5), (200,5), (300, 5)]
μs = [t for t in [0.1, 0.5, 1.0]]
We define a function to generate the test data for the Sparse PCA problem.
function gen_test_data_SPCA(n, m, p)
A = rand(Normal(0, 1.0), (m, n))
for i in 1:n
A[:, i] = A[:, i] .- mean(A[:, i])
A[:, i] = A[:, i] / std(A[:, i])
end
svdA = svd(A)
Vt = svdA.Vt
PCs = Vt[:, 1:p]
d = svdA.S[1:p]
return A, PCs, d
end
We define the proximal operator for the $\ell_1$-norm on the oblique manifold, following [BJJP25b].
# Returns prox_{μ||.||_1}(M,x) on the Oblique Manifold OB(n,p) with respect to riemannian distance
function prox_l1_OB(n, p, μ; tol = 1e-10, max_iters = 10)
return function prox_l1_OB_μ(M, λ, X)
μλ = μ * λ
prox_X = Array{Float64}(undef, n, p)
for k in 1:p
x = X[:, k]
t = μλ
px_t = X[:, k]
for _ in 1:max_iters
t_old = t
z = abs.(x) .- t
prox_Rn_t = (z .> 0) .* sign.(x) .* z
px_t = prox_Rn_t/norm(prox_Rn_t)
xpx_t = x'px_t
if xpx_t < 1
t = μλ * sqrt(1-(xpx_t)^2)/acos(xpx_t)
else
px_t = x
prox_X[:, k] = x
break
end
if abs(t - t_old) < tol
prox_X[:, k] = px_t
break
end
end
prox_X[:, k] = px_t
end
return prox_X
end
end
# Objective, gradient, and proxes
g(M, X, H, D) = 0.5 * norm(X'H * X - D)^2
function grad_g(M, X, H, D)
HX = H*X
return project(M, X, 2*HX*(X'HX - D))
end
h(M, X, μ) = μ * norm(X, 1)
f(M, X, H, D, μ) = 0.5 * norm(X'H * X - D)^2 + μ * norm(X, 1)
We introduce an implementation of the RPG method for the Sparse PCA problem on the oblique manifold, following [HW21].
# Implementation of the proximal operator for the ℓ1-norm on the Oblique manifold
function RPG_prox_OB(S, X, grad_fX, λ, L, n, p; max_iters = 10, tol=1e-10)
λ̃ = λ/L
d = 0
for k in 1:p
x = X[:,k]
ξ_x = 1/L * grad_fX[:,k]
neg∇h = (x-ξ_x)/λ̃
i_max = argmax(abs.(neg∇h))
if abs(neg∇h[i_max]) <= 1.0
y = sign(neg∇h[i_max])*(1:n .== i_max)
else
z = abs.(neg∇h) .- 1.0
Act_set = z .> 0
y = Act_set .* sign.(neg∇h) .* z
y = y/norm(y)
end
for j in 1:max_iters
xty = x'y
if xty >= 1
sy = -1
ty = 1
else
ξty = ξ_x'y
acosxty = acos(xty)
α = 1-xty^2
sy = - acosxty/sqrt(α) - ξty/α + acosxty * ξty * xty / sqrt(α^3)
ty = acosxty/sqrt(α)
end
neg∇h = -(sy*x+ty*ξ_x)/λ̃
i_max = argmax(abs.(neg∇h))
if abs(neg∇h[i_max]) <= 1.0
y_new = sign(neg∇h[i_max])*(1:n .== i_max)
else
z = abs.(neg∇h) .- 1.0
Act_set = z .> 0
y_new = Act_set .* sign.(neg∇h) .* z
y_new = y_new/norm(y_new)
end
if max(abs(xty-x'y_new), abs(ξ_x'y-ξ_x'y_new)) < tol
break
end
y = y_new
end
d += distance(S, x,y)^2
X[:,k] = y
end
return sqrt(d)
end
#
# RPG implementation for Sparse PCA on the Oblique manifold
function RPG_SPCA_OB(M, H, D, μ, L, start, prox_fun; max_iters = 1000, stop = 1e-8, record = false)
n, p = size(start)
S = Sphere(n-1)
cost_fun(M,X) = 0.5*norm(X'H*X-D)^2 + μ*norm(X,1)
function grad_f(M,X, D=D)
HX = H*X
return project(M, X, 2*HX*(X'HX-D))
end
X = copy(start)
if !record
for i in 1:max_iters
change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
if L*change < stop
return X, i
end
end
return X, max_iters
else
Iterates = []
for i in 1:max_iters
change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
push!(Iterates, copy(X))
if L*change < stop
return Iterates, i
end
end
return Iterates, max_iters
end
end
We set up some variables to collect the results of the experiments and initialize the dataframes
And run the experiments
for (n, p) in n_p_array
# Define manifold
OB = Oblique(n, p)
for m in 1:m_tests
# Construct problem
A, PCs, d = gen_test_data_SPCA(n, means, p)
H = A'A / norm(A'A) * 10
D = diagm(svd(H).S[1:p])
L = 2 * tr(H)
for (c, μ) in enumerate(μs)
# Localize functions
g(M, X) = g(M, X, H, D)
grad_g(M, X) = grad_g(M, X, H, D)
h(M, X) = h(M, X, μ)
prox_norm1_NCRPG = prox_l1_OB(n, p, μ)
f(M, X) = f(M, X, H, D, μ)
#
# Parameters
step_size = 1/L
init_step_size_bt = 10 * step_size
stop_step_size_bt = atol
stop_RPG = atol
stop_NCRPG = atol
stop_NCRPG_bt = atol
#
# Fix starting point
start = rand(OB)
#
# Optimization
# NCRPG
rec_NCRPG = proximal_gradient_method(OB, f, g, grad_g, start;
prox_nonsmooth = prox_norm1_NCRPG,
stepsize = ConstantLength(step_size),
record = [:Iteration, :Iterate],
return_state = true,
stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG)
)
# Benchmark NCRPG
bm_NCRPG = @benchmark proximal_gradient_method(``OB, ``f, ``g, ``grad_g, ``start;
prox_nonsmooth = ``prox_norm1_NCRPG,
stepsize = ConstantLength(``step_size),
stopping_criterion = StopAfterIteration(``max_iters)| StopWhenGradientMappingNormLess(``stop_NCRPG)
)
# NCRPG with backtracking
rec_NCRPG_bt = proximal_gradient_method(OB, f, g, grad_g, start;
prox_nonsmooth = prox_norm1_NCRPG,
stepsize = ProximalGradientMethodBacktracking(;
strategy = :nonconvex,
initial_stepsize = init_step_size_bt,
stop_when_stepsize_less = stop_step_size_bt
),
record = [:Iteration, :Iterate],
return_state = true,
stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG_bt)
)
# Benchmark NCRPG with backtracking
bm_NCRPG_bt = @benchmark proximal_gradient_method(``OB, ``f, ``g, ``grad_g, ``start;
prox_nonsmooth = ``prox_norm1_NCRPG,
stepsize = ProximalGradientMethodBacktracking(;
strategy = :nonconvex,
initial_stepsize = ``init_step_size_bt,
stop_when_stepsize_less = ``stop_step_size_bt
),
stopping_criterion = StopAfterIteration(``max_iters)| StopWhenGradientMappingNormLess(``stop_NCRPG_bt)
)
# RPG
Iterates_RPG, it_RPG = RPG_SPCA_OB(OB, H, D, μ, L, start, RPG_prox_OB;
max_iters = max_iters,
stop = stop_RPG,
record = true
)
bm_RPG = @benchmark RPG_SPCA_OB(``OB, ``H, ``D, ``μ, ``L, ``start, ``RPG_prox_OB;
max_iters = ``max_iters,
stop = ``stop_RPG
)
#
# Collect test results
Iterates_NCRPG = get_record(rec_NCRPG, :Iteration, :Iterate)
res_NCRPG = Iterates_NCRPG[end]
time_NCRPG = time(median(bm_NCRPG))/1e9
obj_NCRPG = f(OB, res_NCRPG)
spar_NCRPG = sum(abs.(res_NCRPG).< 1e-8)/n/p
it_NCRPG = length(Iterates_NCRPG)
orth_NCRPG = norm(res_NCRPG'*res_NCRPG - I(p))
# NCRPG with backtracking
Iterates_NCRPG_bt = get_record(rec_NCRPG_bt, :Iteration, :Iterate)
res_NCRPG_bt = Iterates_NCRPG_bt[end]
time_NCRPG_bt = time(median(bm_NCRPG_bt))/1e9
obj_NCRPG_bt = f(OB, res_NCRPG_bt)
spar_NCRPG_bt = sum(abs.(res_NCRPG_bt).< 1e-8)/n/p
it_NCRPG_bt = length(Iterates_NCRPG_bt)
orth_NCRPG_bt = norm(res_NCRPG_bt'*res_NCRPG_bt - I(p))
# RPG
res_RPG = Iterates_RPG[end]
time_RPG = time(median(bm_RPG))/1e9
obj_RPG = f(OB, res_RPG)
spar_RPG = sum(abs.(res_RPG).< 1e-8)/n/p
orth_RPG = norm(res_RPG'*res_RPG - I(p))
#
# Update results
# Time values
time_NCRPG_tmp[c] += time_NCRPG
time_NCRPG_bt_tmp[c] += time_NCRPG_bt
time_RPG_tmp[c] += time_RPG
# Objective values
obj_NCRPG_tmp[c] += obj_NCRPG
obj_NCRPG_bt_tmp[c] += obj_NCRPG_bt
obj_RPG_tmp[c] += obj_RPG
# Sparsity values
spar_NCRPG_tmp[c] += spar_NCRPG
spar_NCRPG_bt_tmp[c] += spar_NCRPG_bt
spar_RPG_tmp[c] += spar_RPG
# Orthogonality values
orth_NCRPG_tmp[c] += orth_NCRPG
orth_NCRPG_bt_tmp[c] += orth_NCRPG_bt
orth_RPG_tmp[c] += orth_RPG
# Iteration values
it_NCRPG_tmp[c] += it_NCRPG
it_NCRPG_bt_tmp[c] += it_NCRPG_bt
it_RPG_tmp[c] += it_RPG
end
end
for (c, μ) in enumerate(μs)
push!(df_results_RPG,
[μ, n, p, time_RPG_tmp[c]/m_tests, obj_RPG_tmp[c]/m_tests, spar_RPG_tmp[c]/m_tests, it_RPG_tmp[c]/m_tests, orth_RPG_tmp[c]/m_tests]
)
push!(df_results_NCRPG,
[μ, n, p, time_NCRPG_tmp[c]/m_tests, obj_NCRPG_tmp[c]/m_tests, spar_NCRPG_tmp[c]/m_tests, it_NCRPG_tmp[c]/m_tests, orth_NCRPG_tmp[c]/m_tests]
)
push!(df_results_NCRPG_bt,
[μ, n, p, time_NCRPG_bt_tmp[c]/m_tests, obj_NCRPG_bt_tmp[c]/m_tests, spar_NCRPG_bt_tmp[c]/m_tests, it_NCRPG_bt_tmp[c]/m_tests, orth_NCRPG_bt_tmp[c]/m_tests]
)
end
#
# Reset data collection variables
time_RPG_tmp .= zeros(length(μs))
time_NCRPG_tmp .= zeros(length(μs))
time_NCRPG_bt_tmp .= zeros(length(μs))
obj_RPG_tmp .= zeros(length(μs))
obj_NCRPG_tmp .= zeros(length(μs))
obj_NCRPG_bt_tmp .= zeros(length(μs))
spar_RPG_tmp .= zeros(length(μs))
spar_NCRPG_tmp .= zeros(length(μs))
spar_NCRPG_bt_tmp .= zeros(length(μs))
it_RPG_tmp .= zeros(length(μs))
it_NCRPG_tmp .= zeros(length(μs))
it_NCRPG_bt_tmp .= zeros(length(μs))
orth_RPG_tmp .= zeros(length(μs))
orth_NCRPG_tmp .= zeros(length(μs))
orth_NCRPG_bt_tmp .= zeros(length(μs))
end
We export the results to CSV files
# Sort the dataframes by the parameter μ and create the final results dataframes
df_results_NCRPG = sort(df_results_NCRPG, :μ)
df_results_NCRPG_bt = sort(df_results_NCRPG_bt, :μ)
df_results_RPG = sort(df_results_RPG, :μ)
df_results_time_iter = DataFrame(
μ = df_results_NCRPG.μ,
n = Int.(df_results_NCRPG.n),
p = Int.(df_results_NCRPG.p),
NCRPG_time = df_results_NCRPG.time,
NCRPG_iter = Int.(round.(df_results_NCRPG.iterations, digits = 0)),
NCRPG_bt_time = df_results_NCRPG_bt.time,
NCRPG_bt_iter = Int.(round.(df_results_NCRPG_bt.iterations, digits = 0)),
RPG_time = df_results_RPG.time,
RPG_iter = Int.(round.(df_results_RPG.iterations, digits = 0)),
)
df_results_obj_spar_orth = DataFrame(
μ = df_results_NCRPG.μ,
n = Int.(df_results_NCRPG.n),
p = Int.(df_results_NCRPG.p),
NCRPG_obj = df_results_NCRPG.objective,
NCRPG_sparsity = df_results_NCRPG.sparsity,
NCRPG_orth = df_results_NCRPG.orthogonality,
NCRPG_bt_obj = df_results_NCRPG_bt.objective,
NCRPG_bt_sparsity = df_results_NCRPG_bt.sparsity,
NCRPG_bt_orth = df_results_NCRPG_bt.orthogonality,
RPG_obj = df_results_RPG.objective,
RPG_sparsity = df_results_RPG.sparsity,
RPG_orth = df_results_RPG.orthogonality,
)
# Write the results to CSV files
CSV.write(joinpath(results_folder, "results-OB-time-iter-``(m_tests).csv"), df_results_time_iter)
CSV.write(joinpath(results_folder, "results-OB-obj-spar-orth-``(m_tests).csv"), df_results_obj_spar_orth)
We can take a look at how the algorithms compare to each other in their performance with the following tables. First, we look at the time and number of iterations for each algorithm.
μ | n | p | NCRPG_const_time | NCRPG_const_iter | NCRPG_bt_time | NCRPG_bt_iter | RPG_time | RPG_iter |
---|---|---|---|---|---|---|---|---|
0.1 | 100 | 5 | 0.684749 | 30786 | 0.529327 | 4416 | 1.08593 | 30786 |
0.1 | 200 | 5 | 1.62113 | 31345 | 1.00647 | 3819 | 2.39352 | 31346 |
0.1 | 300 | 5 | 3.75444 | 35681 | 1.64839 | 3079 | 5.23643 | 35683 |
0.5 | 100 | 5 | 0.279035 | 11953 | 0.0954497 | 809 | 0.411548 | 11957 |
0.5 | 200 | 5 | 0.966696 | 17982 | 0.465404 | 1495 | 1.37787 | 17796 |
0.5 | 300 | 5 | 2.42251 | 21983 | 1.1225 | 1638 | 3.12809 | 22019 |
1.0 | 100 | 5 | 0.23533 | 9808 | 0.179321 | 1090 | 0.33349 | 9817 |
1.0 | 200 | 5 | 0.487548 | 9601 | 0.420006 | 819 | 0.699072 | 9614 |
1.0 | 300 | 5 | 0.0342426 | 331 | 0.00969333 | 26 | 0.0528303 | 331 |
Second, we look at the objective values, sparsity, and orthogonality of the solutions found by each algorithm.
μ | n | p | NCRPG_const_obj | NCRPG_const_spar | NCRPG_const_orth | NCRPG_bt_obj | NCRPG_bt_spar | NCRPG_bt_orth | RPG_obj | RPG_spar | RPG_orth |
---|---|---|---|---|---|---|---|---|---|---|---|
0.1 | 100 | 5 | 3.20343 | 0.475 | 0.145636 | 3.20463 | 0.4756 | 0.14826 | 3.20343 | 0.475 | 0.145636 |
0.1 | 200 | 5 | 4.38751 | 0.5203 | 0.124219 | 4.38662 | 0.5178 | 0.124448 | 4.38751 | 0.5203 | 0.124219 |
0.1 | 300 | 5 | 5.22587 | 0.546533 | 0.0992134 | 5.22137 | 0.5494 | 0.0984192 | 5.22587 | 0.546533 | 0.0992134 |
0.5 | 100 | 5 | 13.0305 | 0.7356 | 0.105535 | 13.039 | 0.7334 | 0.111901 | 13.0305 | 0.7356 | 0.105535 |
0.5 | 200 | 5 | 16.8312 | 0.8125 | 0.0786428 | 16.8541 | 0.8154 | 0.0822406 | 16.8193 | 0.8124 | 0.080326 |
0.5 | 300 | 5 | 19.2926 | 0.867667 | 0.0621717 | 19.2791 | 0.869733 | 0.0622674 | 19.2926 | 0.867667 | 0.0621717 |
1.0 | 100 | 5 | 22.0226 | 0.874 | 0.0620929 | 21.9882 | 0.8742 | 0.0581779 | 22.0226 | 0.874 | 0.0620929 |
1.0 | 200 | 5 | 25.5495 | 0.9783 | 0.0540715 | 25.5905 | 0.9822 | 0.0540714 | 25.5495 | 0.9783 | 0.0540715 |
1.0 | 300 | 5 | 25.0829 | 0.996667 | 0.0 | 25.0812 | 0.996667 | 0.0 | 25.0829 | 0.996667 | 0.0 |
Technical details
This tutorial is cached. It was last run on the following package versions.
using Pkg
Pkg.status()
Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
[6e4b80f9] BenchmarkTools v1.6.0
[336ed68f] CSV v0.10.15
[13f3f980] CairoMakie v0.15.6
[0ca39b1e] Chairmarks v1.3.1
[35d6a980] ColorSchemes v3.31.0
⌅ [5ae59095] Colors v0.12.11
[a93c6f00] DataFrames v1.8.0
[31c24e10] Distributions v0.25.120
[7073ff75] IJulia v1.30.4
[682c06a0] JSON v0.21.4
[8ac3fa9e] LRUCache v1.6.2
[b964fa9f] LaTeXStrings v1.4.0
[d3d80556] LineSearches v7.4.0
[ee78f7c6] Makie v0.24.6
[af67fdf4] ManifoldDiff v0.4.4
[1cead3c2] Manifolds v0.10.23
[3362f125] ManifoldsBase v1.2.0
[0fc0a36d] Manopt v0.5.23 `../../Manopt.jl`
[5b8d5e80] ManoptExamples v0.1.15 `..`
[51fcb6bd] NamedColors v0.2.3
[91a5bcdd] Plots v1.41.1
[08abe8d2] PrettyTables v3.0.11
[6099a3de] PythonCall v0.9.28
[f468eda6] QuadraticModels v0.9.14
[1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`
This tutorial was last rendered October 1, 2025, 22:15:43.
Literature
- [BJJP25b]
- R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Nononvex Optimization, preprint (2025), arXiv:2506.09775.
- [HW21]
- W. Huang and K. Wei. Riemannian proximal gradient methods. Mathematical Programming 194, 371–413 (2021).