Sparse PCA

Paula John, Hajg Jasa 2025-10-01

Introduction

In this example we use the Nonconvex Riemannian Proximal Gradient (NCRPG) method [BJJP25b] and compare it to the Riemannian Proximal Gradient (RPG) method [HW21]. This example reproduces the results from [BJJP25b], Section 6.1. The numbers may vary slightly due to having run this notebook on a different machine.

using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots, LaTeXStrings
using Random, LinearAlgebra, LRUCache, Distributions
using ManifoldDiff, Manifolds, Manopt, ManoptExamples

The Problem

Let $\mathcal M = \mathrm{OB}(n,r)$ be the oblique manifold, i.e., the set of $n \times r$ matrices with unit-norm columns. Let $g \colon \mathcal M \to \mathbb R$ be defined by

\[g(X) = \frac{1}{2} \Vert X^\top A^\top A X - D^2 \Vert^2,\]

where $A \in \mathbb R^{m \times n}$ is a data matrix, $D = \mathrm{diag}(d_1, \ldots, d_r)$ is a diagonal matrix containing the top $r$ singular values of $A$, and $\Vert \cdot \Vert$ is the Frobenius norm.

Let $h \colon \mathcal M \to \mathbb R$ be defined by

\[h(X) = \mu \Vert X \Vert_1\]

be the sparsity-enforcing term given by the $\ell_1$-norm, where $\mu \ge 0$ is a regularization parameter.

We define our total objective function as $f = g + h$. The goal is to find the minimizer of $f$ on $\mathcal M$, which is heuristically the point that diagonalizes $A^\top A$ as much as possible while being sparse.

Numerical Experiment

We initialize the experiment parameters, as well as some utility functions.

# Set random seed for reproducibility
random_seed = 1520
Random.seed!(random_seed)
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.0 
m_tests = 10 # number of tests for each parameter setting
means = 20 # number of means to compute

atol = 1e-7
max_iters = 100000
n_p_array = [(100,5), (200,5), (300, 5)]
μs = [t for t in [0.1, 0.5, 1.0]]

We define a function to generate the test data for the Sparse PCA problem.

function gen_test_data_SPCA(n, m, p)
    A = rand(Normal(0, 1.0), (m, n))
    for i in 1:n
        A[:, i] = A[:, i] .- mean(A[:, i])
        A[:, i] = A[:, i] / std(A[:, i])
    end 
    svdA = svd(A)
    Vt = svdA.Vt
    PCs = Vt[:, 1:p]
    d = svdA.S[1:p]
    return A, PCs, d
end

We define the proximal operator for the $\ell_1$-norm on the oblique manifold, following [BJJP25b].

# Returns prox_{μ||.||_1}(M,x) on the Oblique Manifold OB(n,p) with respect to riemannian distance
function prox_l1_OB(n, p, μ; tol = 1e-10, max_iters = 10)
    return function prox_l1_OB_μ(M, λ, X)
        μλ = μ * λ
        prox_X = Array{Float64}(undef, n, p)
        for k in 1:p 
            x = X[:, k]
            t = μλ
            px_t = X[:, k]
            for _ in 1:max_iters 
                t_old = t

                z = abs.(x) .- t 
                prox_Rn_t = (z .> 0) .* sign.(x) .* z

                px_t = prox_Rn_t/norm(prox_Rn_t)
                xpx_t = x'px_t
                if xpx_t < 1
                    t = μλ * sqrt(1-(xpx_t)^2)/acos(xpx_t)
                else
                    px_t = x
                    prox_X[:, k] = x
                    break
                end 
                if abs(t - t_old) < tol 
                    prox_X[:, k] = px_t
                    break
                end 
            end 
            prox_X[:, k] = px_t
        end 
        return prox_X
    end 
end

# Objective, gradient, and proxes
g(M, X, H, D) = 0.5 * norm(X'H * X - D)^2 
function grad_g(M, X, H, D)
    HX = H*X
    return project(M, X, 2*HX*(X'HX - D))
end
h(M, X, μ) = μ * norm(X, 1)
f(M, X, H, D, μ) = 0.5 * norm(X'H * X - D)^2 + μ * norm(X, 1)

We introduce an implementation of the RPG method for the Sparse PCA problem on the oblique manifold, following [HW21].

# Implementation of the proximal operator for the ℓ1-norm on the Oblique manifold
function RPG_prox_OB(S, X, grad_fX, λ, L, n, p; max_iters  = 10, tol=1e-10)
    λ̃ = λ/L
    d = 0 
    for k in 1:p
        x = X[:,k] 
        ξ_x = 1/L * grad_fX[:,k]
        
        neg∇h = (x-ξ_x)/λ̃
        i_max = argmax(abs.(neg∇h))
        if abs(neg∇h[i_max]) <= 1.0
            y = sign(neg∇h[i_max])*(1:n .== i_max)
        else
            z = abs.(neg∇h) .- 1.0
            Act_set = z .> 0  
            y = Act_set .* sign.(neg∇h) .* z
            y = y/norm(y)
        end
        for j in 1:max_iters  
            xty = x'y 
            if xty >= 1
                sy = -1 
                ty = 1
            else 
                ξty = ξ_x'y
                acosxty = acos(xty)
                α = 1-xty^2 
                sy = - acosxty/sqrt(α) - ξty/α + acosxty * ξty * xty / sqrt(α^3)
                ty = acosxty/sqrt(α)
            end
            neg∇h = -(sy*x+ty*ξ_x)/λ̃
            
            i_max = argmax(abs.(neg∇h))
            if abs(neg∇h[i_max]) <= 1.0
                y_new = sign(neg∇h[i_max])*(1:n .== i_max)
            else
                z = abs.(neg∇h) .- 1.0
                Act_set = z .> 0  
                y_new = Act_set .* sign.(neg∇h) .* z
                y_new = y_new/norm(y_new)
            end
            
            if max(abs(xty-x'y_new), abs(ξ_x'y-ξ_x'y_new)) < tol 
                break 
            end 
            y = y_new 
        end 

        d += distance(S, x,y)^2
        X[:,k] = y
    end 
    return sqrt(d)
end
#
# RPG implementation for Sparse PCA on the Oblique manifold
function RPG_SPCA_OB(M, H, D, μ, L, start, prox_fun; max_iters  = 1000, stop = 1e-8, record = false)
    n, p = size(start)
    S = Sphere(n-1)
    cost_fun(M,X) =  0.5*norm(X'H*X-D)^2 + μ*norm(X,1)
    function grad_f(M,X, D=D)
        HX = H*X
        return project(M, X, 2*HX*(X'HX-D))
    end
    X = copy(start) 
    if !record 
        for i in 1:max_iters 
            change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
            if L*change < stop
                return X, i
            end 
        end 
        return X, max_iters  
    else 
        Iterates = []
        for i in 1:max_iters 
            change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
            push!(Iterates, copy(X))
            if L*change < stop
                return Iterates, i
            end 
        end 
        return Iterates, max_iters  
    end
end

We set up some variables to collect the results of the experiments and initialize the dataframes

And run the experiments

for (n, p) in n_p_array
    # Define manifold
    OB = Oblique(n, p)
    for m in 1:m_tests
        # Construct problem 
        A, PCs, d = gen_test_data_SPCA(n, means, p)
        H = A'A / norm(A'A) * 10
        D = diagm(svd(H).S[1:p])
        L = 2 * tr(H)

        for (c, μ) in enumerate(μs)
            # Localize functions
            g(M, X) = g(M, X, H, D)
            grad_g(M, X) = grad_g(M, X, H, D)
            h(M, X) = h(M, X, μ)
            prox_norm1_NCRPG = prox_l1_OB(n, p, μ)
            f(M, X) = f(M, X, H, D, μ)
            #
            # Parameters
            step_size = 1/L
            init_step_size_bt = 10 * step_size
            stop_step_size_bt = atol
            stop_RPG = atol
            stop_NCRPG = atol
            stop_NCRPG_bt = atol
            #
            # Fix starting point
            start = rand(OB)
            #
            # Optimization
            # NCRPG
            rec_NCRPG = proximal_gradient_method(OB, f, g, grad_g, start;
                prox_nonsmooth = prox_norm1_NCRPG, 
                stepsize = ConstantLength(step_size),
                record = [:Iteration, :Iterate],
                return_state = true,
                stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG)
            )
            # Benchmark NCRPG
            bm_NCRPG = @benchmark proximal_gradient_method(``OB, ``f, ``g, ``grad_g, ``start;
                prox_nonsmooth = ``prox_norm1_NCRPG,  
                stepsize = ConstantLength(``step_size),
                stopping_criterion = StopAfterIteration(``max_iters)| StopWhenGradientMappingNormLess(``stop_NCRPG)
            )
            # NCRPG with backtracking
            rec_NCRPG_bt = proximal_gradient_method(OB, f, g, grad_g, start;
                prox_nonsmooth = prox_norm1_NCRPG, 
                stepsize = ProximalGradientMethodBacktracking(; 
                    strategy = :nonconvex, 
                    initial_stepsize = init_step_size_bt, 
                    stop_when_stepsize_less = stop_step_size_bt
                ),
                record = [:Iteration, :Iterate],
                return_state = true,
                stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG_bt)
            )
            # Benchmark NCRPG with backtracking
            bm_NCRPG_bt = @benchmark proximal_gradient_method(``OB, ``f, ``g, ``grad_g, ``start;
                prox_nonsmooth = ``prox_norm1_NCRPG,  
                stepsize = ProximalGradientMethodBacktracking(; 
                    strategy = :nonconvex, 
                    initial_stepsize = ``init_step_size_bt, 
                    stop_when_stepsize_less = ``stop_step_size_bt
                ),
                stopping_criterion = StopAfterIteration(``max_iters)| StopWhenGradientMappingNormLess(``stop_NCRPG_bt)
            )
            # RPG 
            Iterates_RPG, it_RPG = RPG_SPCA_OB(OB, H, D, μ, L, start, RPG_prox_OB;
                max_iters = max_iters, 
                stop = stop_RPG, 
                record = true
            )
            bm_RPG = @benchmark RPG_SPCA_OB(``OB, ``H, ``D, ``μ, ``L, ``start, ``RPG_prox_OB; 
                max_iters = ``max_iters, 
                stop = ``stop_RPG
            )
            #
            # Collect test results
            Iterates_NCRPG  = get_record(rec_NCRPG, :Iteration, :Iterate)
            res_NCRPG       = Iterates_NCRPG[end]
            time_NCRPG      = time(median(bm_NCRPG))/1e9
            obj_NCRPG       = f(OB, res_NCRPG)
            spar_NCRPG      = sum(abs.(res_NCRPG).< 1e-8)/n/p
            it_NCRPG        = length(Iterates_NCRPG)
            orth_NCRPG      = norm(res_NCRPG'*res_NCRPG - I(p))
            # NCRPG with backtracking
            Iterates_NCRPG_bt  = get_record(rec_NCRPG_bt, :Iteration, :Iterate)
            res_NCRPG_bt       = Iterates_NCRPG_bt[end]
            time_NCRPG_bt      = time(median(bm_NCRPG_bt))/1e9
            obj_NCRPG_bt       = f(OB, res_NCRPG_bt)
            spar_NCRPG_bt      = sum(abs.(res_NCRPG_bt).< 1e-8)/n/p
            it_NCRPG_bt        = length(Iterates_NCRPG_bt)
            orth_NCRPG_bt      = norm(res_NCRPG_bt'*res_NCRPG_bt - I(p))
            # RPG
            res_RPG         = Iterates_RPG[end]
            time_RPG        = time(median(bm_RPG))/1e9
            obj_RPG         = f(OB, res_RPG)
            spar_RPG        = sum(abs.(res_RPG).< 1e-8)/n/p
            orth_RPG        = norm(res_RPG'*res_RPG - I(p))
            #
            # Update results
            # Time values
            time_NCRPG_tmp[c]      += time_NCRPG
            time_NCRPG_bt_tmp[c]   += time_NCRPG_bt
            time_RPG_tmp[c]        += time_RPG
            # Objective values
            obj_NCRPG_tmp[c]       += obj_NCRPG
            obj_NCRPG_bt_tmp[c]    += obj_NCRPG_bt
            obj_RPG_tmp[c]         += obj_RPG
            # Sparsity values
            spar_NCRPG_tmp[c]      += spar_NCRPG
            spar_NCRPG_bt_tmp[c]   += spar_NCRPG_bt
            spar_RPG_tmp[c]        += spar_RPG
            # Orthogonality values
            orth_NCRPG_tmp[c]      += orth_NCRPG
            orth_NCRPG_bt_tmp[c]   += orth_NCRPG_bt
            orth_RPG_tmp[c]        += orth_RPG
            # Iteration values
            it_NCRPG_tmp[c]        += it_NCRPG
            it_NCRPG_bt_tmp[c]     += it_NCRPG_bt
            it_RPG_tmp[c]          += it_RPG
        end 
    end 
    for (c, μ) in enumerate(μs)
        push!(df_results_RPG, 
            [μ, n, p, time_RPG_tmp[c]/m_tests, obj_RPG_tmp[c]/m_tests, spar_RPG_tmp[c]/m_tests, it_RPG_tmp[c]/m_tests, orth_RPG_tmp[c]/m_tests]
        )
        push!(df_results_NCRPG, 
            [μ, n, p, time_NCRPG_tmp[c]/m_tests, obj_NCRPG_tmp[c]/m_tests, spar_NCRPG_tmp[c]/m_tests, it_NCRPG_tmp[c]/m_tests, orth_NCRPG_tmp[c]/m_tests]
        )
        push!(df_results_NCRPG_bt, 
            [μ, n, p, time_NCRPG_bt_tmp[c]/m_tests, obj_NCRPG_bt_tmp[c]/m_tests, spar_NCRPG_bt_tmp[c]/m_tests, it_NCRPG_bt_tmp[c]/m_tests, orth_NCRPG_bt_tmp[c]/m_tests]
        )
    end
    #
    # Reset data collection variables
    time_RPG_tmp      .= zeros(length(μs))
    time_NCRPG_tmp    .= zeros(length(μs))
    time_NCRPG_bt_tmp .= zeros(length(μs))
    obj_RPG_tmp       .= zeros(length(μs))
    obj_NCRPG_tmp     .= zeros(length(μs))
    obj_NCRPG_bt_tmp  .= zeros(length(μs))
    spar_RPG_tmp      .= zeros(length(μs))
    spar_NCRPG_tmp    .= zeros(length(μs))
    spar_NCRPG_bt_tmp .= zeros(length(μs))
    it_RPG_tmp        .= zeros(length(μs))
    it_NCRPG_tmp      .= zeros(length(μs))
    it_NCRPG_bt_tmp   .= zeros(length(μs))
    orth_RPG_tmp      .= zeros(length(μs))
    orth_NCRPG_tmp    .= zeros(length(μs))
    orth_NCRPG_bt_tmp .= zeros(length(μs))
end

We export the results to CSV files

# Sort the dataframes by the parameter μ and create the final results dataframes
df_results_NCRPG = sort(df_results_NCRPG, :μ)
df_results_NCRPG_bt = sort(df_results_NCRPG_bt, :μ)
df_results_RPG = sort(df_results_RPG, :μ)
df_results_time_iter = DataFrame(
    μ             = df_results_NCRPG.μ,
    n             = Int.(df_results_NCRPG.n), 
    p             = Int.(df_results_NCRPG.p),
    NCRPG_time     = df_results_NCRPG.time, 
    NCRPG_iter     = Int.(round.(df_results_NCRPG.iterations, digits = 0)), 
    NCRPG_bt_time  = df_results_NCRPG_bt.time, 
    NCRPG_bt_iter  = Int.(round.(df_results_NCRPG_bt.iterations, digits = 0)),
    RPG_time     = df_results_RPG.time, 
    RPG_iter     = Int.(round.(df_results_RPG.iterations, digits = 0)),
)
df_results_obj_spar_orth = DataFrame(
    μ               = df_results_NCRPG.μ,
    n               = Int.(df_results_NCRPG.n), 
    p               = Int.(df_results_NCRPG.p),
    NCRPG_obj       = df_results_NCRPG.objective, 
    NCRPG_sparsity  = df_results_NCRPG.sparsity,  
    NCRPG_orth      = df_results_NCRPG.orthogonality,
    NCRPG_bt_obj    = df_results_NCRPG_bt.objective, 
    NCRPG_bt_sparsity = df_results_NCRPG_bt.sparsity,
    NCRPG_bt_orth   = df_results_NCRPG_bt.orthogonality,  
    RPG_obj         = df_results_RPG.objective, 
    RPG_sparsity    = df_results_RPG.sparsity, 
    RPG_orth        = df_results_RPG.orthogonality,
)
# Write the results to CSV files
CSV.write(joinpath(results_folder, "results-OB-time-iter-``(m_tests).csv"), df_results_time_iter)
CSV.write(joinpath(results_folder, "results-OB-obj-spar-orth-``(m_tests).csv"), df_results_obj_spar_orth)

We can take a look at how the algorithms compare to each other in their performance with the following tables. First, we look at the time and number of iterations for each algorithm.

μ	n	p	NCRPG_const_time	NCRPG_const_iter	NCRPG_bt_time	NCRPG_bt_iter	RPG_time	RPG_iter
0.1	100	5	0.684749	30786	0.529327	4416	1.08593	30786
0.1	200	5	1.62113	31345	1.00647	3819	2.39352	31346
0.1	300	5	3.75444	35681	1.64839	3079	5.23643	35683
0.5	100	5	0.279035	11953	0.0954497	809	0.411548	11957
0.5	200	5	0.966696	17982	0.465404	1495	1.37787	17796
0.5	300	5	2.42251	21983	1.1225	1638	3.12809	22019
1.0	100	5	0.23533	9808	0.179321	1090	0.33349	9817
1.0	200	5	0.487548	9601	0.420006	819	0.699072	9614
1.0	300	5	0.0342426	331	0.00969333	26	0.0528303	331

Second, we look at the objective values, sparsity, and orthogonality of the solutions found by each algorithm.

μ	n	p	NCRPG_const_obj	NCRPG_const_spar	NCRPG_const_orth	NCRPG_bt_obj	NCRPG_bt_spar	NCRPG_bt_orth	RPG_obj	RPG_spar	RPG_orth
0.1	100	5	3.20343	0.475	0.145636	3.20463	0.4756	0.14826	3.20343	0.475	0.145636
0.1	200	5	4.38751	0.5203	0.124219	4.38662	0.5178	0.124448	4.38751	0.5203	0.124219
0.1	300	5	5.22587	0.546533	0.0992134	5.22137	0.5494	0.0984192	5.22587	0.546533	0.0992134
0.5	100	5	13.0305	0.7356	0.105535	13.039	0.7334	0.111901	13.0305	0.7356	0.105535
0.5	200	5	16.8312	0.8125	0.0786428	16.8541	0.8154	0.0822406	16.8193	0.8124	0.080326
0.5	300	5	19.2926	0.867667	0.0621717	19.2791	0.869733	0.0622674	19.2926	0.867667	0.0621717
1.0	100	5	22.0226	0.874	0.0620929	21.9882	0.8742	0.0581779	22.0226	0.874	0.0620929
1.0	200	5	25.5495	0.9783	0.0540715	25.5905	0.9822	0.0540714	25.5495	0.9783	0.0540715
1.0	300	5	25.0829	0.996667	0.0	25.0812	0.996667	0.0	25.0829	0.996667	0.0

Technical details

This tutorial is cached. It was last run on the following package versions.

using Pkg
Pkg.status()

Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.0
  [336ed68f] CSV v0.10.15
  [13f3f980] CairoMakie v0.15.6
  [0ca39b1e] Chairmarks v1.3.1
  [35d6a980] ColorSchemes v3.31.0
⌅ [5ae59095] Colors v0.12.11
  [a93c6f00] DataFrames v1.8.0
  [31c24e10] Distributions v0.25.120
  [7073ff75] IJulia v1.30.4
  [682c06a0] JSON v0.21.4
  [8ac3fa9e] LRUCache v1.6.2
  [b964fa9f] LaTeXStrings v1.4.0
  [d3d80556] LineSearches v7.4.0
  [ee78f7c6] Makie v0.24.6
  [af67fdf4] ManifoldDiff v0.4.4
  [1cead3c2] Manifolds v0.10.23
  [3362f125] ManifoldsBase v1.2.0
  [0fc0a36d] Manopt v0.5.23 `../../Manopt.jl`
  [5b8d5e80] ManoptExamples v0.1.15 `..`
  [51fcb6bd] NamedColors v0.2.3
  [91a5bcdd] Plots v1.41.1
  [08abe8d2] PrettyTables v3.0.11
  [6099a3de] PythonCall v0.9.28
  [f468eda6] QuadraticModels v0.9.14
  [1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

This tutorial was last rendered October 1, 2025, 22:15:43.

Literature

[BJJP25b]: R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Nononvex Optimization, preprint (2025), arXiv:2506.09775.
[HW21]: W. Huang and K. Wei. Riemannian proximal gradient methods. Mathematical Programming 194, 371–413 (2021).