Sparse PCA

Paula John, Hajg Jasa 2025-10-01

Introduction

In this example we use the Nonconvex Riemannian Proximal Gradient (NCRPG) method [BJJP25b] and compare it to the Riemannian Proximal Gradient (RPG) method [HW21]. This example reproduces the results from [BJJP25b], Section 6.1. The numbers may vary slightly due to having run this notebook on a different machine.

using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots, LaTeXStrings
using Random, LinearAlgebra, LRUCache, Distributions
using ManifoldDiff, Manifolds, Manopt, ManoptExamples

The Problem

Let \mathcal M = \mathrm{OB}(n,r) be the oblique manifold, i.e., the set of n \times r matrices with unit-norm columns. Let g \colon \mathcal M \to \mathbb R be defined by

\[g(X) = \frac{1}{2} \Vert X^\top A^\top A X - D^2 \Vert^2,\]

where A \in \mathbb R^{m \times n} is a data matrix, D = \mathrm{diag}(d_1, \ldots, d_r) is a diagonal matrix containing the top r singular values of A, and \Vert \cdot \Vert is the Frobenius norm.

Let h \colon \mathcal M \to \mathbb R be defined by

\[h(X) = \mu \Vert X \Vert_1\]

be the sparsity-enforcing term given by the \ell_1-norm, where \mu \ge 0 is a regularization parameter.

We define our total objective function as f = g + h. The goal is to find the minimizer of f on \mathcal M, which is heuristically the point that diagonalizes A^\top A as much as possible while being sparse.

Numerical Experiment

We initialize the experiment parameters, as well as some utility functions.

# Set random seed for reproducibility
random_seed = 1520
Random.seed!(random_seed)
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.0
m_tests = 10 # number of tests for each parameter setting
means = 20 # number of means to compute

atol = 1e-7
max_iters = 100000
n_p_array = [(100,5), (200,5), (300, 5)]
μs = [t for t in [0.1, 0.5, 1.0]]

We define a function to generate the test data for the Sparse PCA problem.

function gen_test_data_SPCA(n, m, p)
    A = rand(Normal(0, 1.0), (m, n))
    for i in 1:n
        A[:, i] = A[:, i] .- mean(A[:, i])
        A[:, i] = A[:, i] / std(A[:, i])
    end
    svdA = svd(A)
    Vt = svdA.Vt
    PCs = Vt[:, 1:p]
    d = svdA.S[1:p]
    return A, PCs, d
end

We define the proximal operator for the \ell_1-norm on the oblique manifold, following [BJJP25b].

# Returns prox_{μ||.||_1}(M,x) on the Oblique Manifold OB(n,p) with respect to riemannian distance
function prox_l1_OB(n, p, μ; tol = 1e-10, max_iters = 10)
    return function prox_l1_OB_μ(M, λ, X)
        μλ = μ * λ
        prox_X = Array{Float64}(undef, n, p)
        for k in 1:p
            x = X[:, k]
            t = μλ
            px_t = X[:, k]
            for _ in 1:max_iters
                t_old = t

                z = abs.(x) .- t
                prox_Rn_t = (z .> 0) .* sign.(x) .* z

                px_t = prox_Rn_t/norm(prox_Rn_t)
                xpx_t = x'px_t
                if xpx_t < 1
                    t = μλ * sqrt(1-(xpx_t)^2)/acos(xpx_t)
                else
                    px_t = x
                    prox_X[:, k] = x
                    break
                end
                if abs(t - t_old) < tol
                    prox_X[:, k] = px_t
                    break
                end
            end
            prox_X[:, k] = px_t
        end
        return prox_X
    end
end
# Objective, gradient, and proxes
g_(M, X, H, D) = 0.5 * norm(X'H * X - D)^2
function grad_g_(M, X, H, D)
    HX = H*X
    return project(M, X, 2*HX*(X'HX - D))
end
h_(M, X, μ) = μ * norm(X, 1)
f_(M, X, H, D, μ) = 0.5 * norm(X'H * X - D)^2 + μ * norm(X, 1)

We introduce an implementation of the RPG method for the Sparse PCA problem on the oblique manifold, following [HW21].

# Implementation of the proximal operator for the ℓ1-norm on the Oblique manifold
function RPG_prox_OB(S, X, grad_fX, λ, L, n, p; max_iters  = 10, tol=1e-10)
    λ̃ = λ/L
    d = 0
    for k in 1:p
        x = X[:,k]
        ξ_x = 1/L * grad_fX[:,k]

        neg∇h = (x-ξ_x)/λ̃
        i_max = argmax(abs.(neg∇h))
        if abs(neg∇h[i_max]) <= 1.0
            y = sign(neg∇h[i_max])*(1:n .== i_max)
        else
            z = abs.(neg∇h) .- 1.0
            Act_set = z .> 0
            y = Act_set .* sign.(neg∇h) .* z
            y = y/norm(y)
        end
        for j in 1:max_iters
            xty = x'y
            if xty >= 1
                sy = -1
                ty = 1
            else
                ξty = ξ_x'y
                acosxty = acos(xty)
                α = 1-xty^2
                sy = - acosxty/sqrt(α) - ξty/α + acosxty * ξty * xty / sqrt(α^3)
                ty = acosxty/sqrt(α)
            end
            neg∇h = -(sy*x+ty*ξ_x)/λ̃

            i_max = argmax(abs.(neg∇h))
            if abs(neg∇h[i_max]) <= 1.0
                y_new = sign(neg∇h[i_max])*(1:n .== i_max)
            else
                z = abs.(neg∇h) .- 1.0
                Act_set = z .> 0
                y_new = Act_set .* sign.(neg∇h) .* z
                y_new = y_new/norm(y_new)
            end

            if max(abs(xty-x'y_new), abs(ξ_x'y-ξ_x'y_new)) < tol
                break
            end
            y = y_new
        end

        d += distance(S, x,y)^2
        X[:,k] = y
    end
    return sqrt(d)
end
#
# RPG implementation for Sparse PCA on the Oblique manifold
function RPG_SPCA_OB(M, H, D, μ, L, start, prox_fun; max_iters  = 1000, stop = 1e-8, record = false)
    n, p = size(start)
    S = Sphere(n-1)
    cost_fun(M,X) =  0.5*norm(X'H*X-D)^2 + μ*norm(X,1)
    function grad_f(M,X, D=D)
        HX = H*X
        return project(M, X, 2*HX*(X'HX-D))
    end
    X = copy(start)
    if !record
        for i in 1:max_iters
            change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
            if L*change < stop
                return X, i
            end
        end
        return X, max_iters
    else
        Iterates = []
        for i in 1:max_iters
            change = prox_fun(S, X, grad_f(M,X,D), μ, L, n, p)
            push!(Iterates, copy(X))
            if L*change < stop
                return Iterates, i
            end
        end
        return Iterates, max_iters
    end
end

We set up some variables to collect the results of the experiments and initialize the dataframes

And run the experiments

for (n, p) in n_p_array
    # Define manifold
    OB = Oblique(n, p)
    for m in 1:m_tests
        # Construct problem
        A, PCs, d = gen_test_data_SPCA(n, means, p)
        H = A'A / norm(A'A) * 10
        D = diagm(svd(H).S[1:p])
        L = 2 * tr(H)

        for (c, μ) in enumerate(μs)
            # Localize functions
            g(M, X) = g_(M, X, H, D)
            grad_g(M, X) = grad_g_(M, X, H, D)
            h(M, X) = h_(M, X, μ)
            prox_norm1_NCRPG = prox_l1_OB(n, p, μ)
            f(M, X) = f_(M, X, H, D, μ)
            #
            # Parameters
            step_size = 1/L
            init_step_size_bt = 10 * step_size
            stop_step_size_bt = atol
            stop_RPG = atol
            stop_NCRPG = atol
            stop_NCRPG_bt = atol
            #
            # Fix starting point
            start = rand(OB)
            #
            # Optimization
            # NCRPG
            rec_NCRPG = proximal_gradient_method(OB, f, g, grad_g, start;
                prox_nonsmooth = prox_norm1_NCRPG,
                stepsize = ConstantLength(step_size),
                record = [:Iteration, :Iterate],
                return_state = true,
                stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG)
            )
            # Benchmark NCRPG
            bm_NCRPG = @benchmark proximal_gradient_method($OB, $f, $g, $grad_g, $start;
                prox_nonsmooth = $prox_norm1_NCRPG,
                stepsize = ConstantLength($step_size),
                stopping_criterion = StopAfterIteration($max_iters)| StopWhenGradientMappingNormLess($stop_NCRPG)
            )
            # NCRPG with backtracking
            rec_NCRPG_bt = proximal_gradient_method(OB, f, g, grad_g, start;
                prox_nonsmooth = prox_norm1_NCRPG,
                stepsize = ProximalGradientMethodBacktracking(;
                    strategy = :nonconvex,
                    initial_stepsize = init_step_size_bt,
                    stop_when_stepsize_less = stop_step_size_bt
                ),
                record = [:Iteration, :Iterate],
                return_state = true,
                stopping_criterion = StopAfterIteration(max_iters)| StopWhenGradientMappingNormLess(stop_NCRPG_bt)
            )
            # Benchmark NCRPG with backtracking
            bm_NCRPG_bt = @benchmark proximal_gradient_method($OB, $f, $g, $grad_g, $start;
                prox_nonsmooth = $prox_norm1_NCRPG,
                stepsize = ProximalGradientMethodBacktracking(;
                    strategy = :nonconvex,
                    initial_stepsize = $init_step_size_bt,
                    stop_when_stepsize_less = $stop_step_size_bt
                ),
                stopping_criterion = StopAfterIteration($max_iters)| StopWhenGradientMappingNormLess($stop_NCRPG_bt)
            )
            # RPG
            Iterates_RPG, it_RPG = RPG_SPCA_OB(OB, H, D, μ, L, start, RPG_prox_OB;
                max_iters = max_iters,
                stop = stop_RPG,
                record = true
            )
            bm_RPG = @benchmark RPG_SPCA_OB($OB, $H, $D, $μ, $L, $start, $RPG_prox_OB;
                max_iters = $max_iters,
                stop = $stop_RPG
            )
            #
            # Collect test results
            Iterates_NCRPG  = get_record(rec_NCRPG, :Iteration, :Iterate)
            res_NCRPG       = Iterates_NCRPG[end]
            time_NCRPG      = time(median(bm_NCRPG))/1e9
            obj_NCRPG       = f(OB, res_NCRPG)
            spar_NCRPG      = sum(abs.(res_NCRPG).< 1e-8)/n/p
            it_NCRPG        = length(Iterates_NCRPG)
            orth_NCRPG      = norm(res_NCRPG'*res_NCRPG - I(p))
            # NCRPG with backtracking
            Iterates_NCRPG_bt  = get_record(rec_NCRPG_bt, :Iteration, :Iterate)
            res_NCRPG_bt       = Iterates_NCRPG_bt[end]
            time_NCRPG_bt      = time(median(bm_NCRPG_bt))/1e9
            obj_NCRPG_bt       = f(OB, res_NCRPG_bt)
            spar_NCRPG_bt      = sum(abs.(res_NCRPG_bt).< 1e-8)/n/p
            it_NCRPG_bt        = length(Iterates_NCRPG_bt)
            orth_NCRPG_bt      = norm(res_NCRPG_bt'*res_NCRPG_bt - I(p))
            # RPG
            res_RPG         = Iterates_RPG[end]
            time_RPG        = time(median(bm_RPG))/1e9
            obj_RPG         = f(OB, res_RPG)
            spar_RPG        = sum(abs.(res_RPG).< 1e-8)/n/p
            orth_RPG        = norm(res_RPG'*res_RPG - I(p))
            #
            # Update results
            # Time values
            time_NCRPG_tmp[c]      += time_NCRPG
            time_NCRPG_bt_tmp[c]   += time_NCRPG_bt
            time_RPG_tmp[c]        += time_RPG
            # Objective values
            obj_NCRPG_tmp[c]       += obj_NCRPG
            obj_NCRPG_bt_tmp[c]    += obj_NCRPG_bt
            obj_RPG_tmp[c]         += obj_RPG
            # Sparsity values
            spar_NCRPG_tmp[c]      += spar_NCRPG
            spar_NCRPG_bt_tmp[c]   += spar_NCRPG_bt
            spar_RPG_tmp[c]        += spar_RPG
            # Orthogonality values
            orth_NCRPG_tmp[c]      += orth_NCRPG
            orth_NCRPG_bt_tmp[c]   += orth_NCRPG_bt
            orth_RPG_tmp[c]        += orth_RPG
            # Iteration values
            it_NCRPG_tmp[c]        += it_NCRPG
            it_NCRPG_bt_tmp[c]     += it_NCRPG_bt
            it_RPG_tmp[c]          += it_RPG
        end
    end
    for (c, μ) in enumerate(μs)
        push!(df_results_RPG,
            [μ, n, p, time_RPG_tmp[c]/m_tests, obj_RPG_tmp[c]/m_tests, spar_RPG_tmp[c]/m_tests, it_RPG_tmp[c]/m_tests, orth_RPG_tmp[c]/m_tests]
        )
        push!(df_results_NCRPG,
            [μ, n, p, time_NCRPG_tmp[c]/m_tests, obj_NCRPG_tmp[c]/m_tests, spar_NCRPG_tmp[c]/m_tests, it_NCRPG_tmp[c]/m_tests, orth_NCRPG_tmp[c]/m_tests]
        )
        push!(df_results_NCRPG_bt,
            [μ, n, p, time_NCRPG_bt_tmp[c]/m_tests, obj_NCRPG_bt_tmp[c]/m_tests, spar_NCRPG_bt_tmp[c]/m_tests, it_NCRPG_bt_tmp[c]/m_tests, orth_NCRPG_bt_tmp[c]/m_tests]
        )
    end
    #
    # Reset data collection variables
    time_RPG_tmp      .= zeros(length(μs))
    time_NCRPG_tmp    .= zeros(length(μs))
    time_NCRPG_bt_tmp .= zeros(length(μs))
    obj_RPG_tmp       .= zeros(length(μs))
    obj_NCRPG_tmp     .= zeros(length(μs))
    obj_NCRPG_bt_tmp  .= zeros(length(μs))
    spar_RPG_tmp      .= zeros(length(μs))
    spar_NCRPG_tmp    .= zeros(length(μs))
    spar_NCRPG_bt_tmp .= zeros(length(μs))
    it_RPG_tmp        .= zeros(length(μs))
    it_NCRPG_tmp      .= zeros(length(μs))
    it_NCRPG_bt_tmp   .= zeros(length(μs))
    orth_RPG_tmp      .= zeros(length(μs))
    orth_NCRPG_tmp    .= zeros(length(μs))
    orth_NCRPG_bt_tmp .= zeros(length(μs))
end

We export the results to CSV files

<details class="code-fold"> <summary>Code</summary>

# Sort the dataframes by the parameter μ and create the final results dataframes
df_results_NCRPG = sort(df_results_NCRPG, :μ)
df_results_NCRPG_bt = sort(df_results_NCRPG_bt, :μ)
df_results_RPG = sort(df_results_RPG, :μ)
df_results_time_iter = DataFrame(
    μ             = df_results_NCRPG.μ,
    n             = Int.(df_results_NCRPG.n),
    p             = Int.(df_results_NCRPG.p),
    NCRPG_time     = df_results_NCRPG.time,
    NCRPG_iter     = Int.(round.(df_results_NCRPG.iterations, digits = 0)),
    NCRPG_bt_time  = df_results_NCRPG_bt.time,
    NCRPG_bt_iter  = Int.(round.(df_results_NCRPG_bt.iterations, digits = 0)),
    RPG_time     = df_results_RPG.time,
    RPG_iter     = Int.(round.(df_results_RPG.iterations, digits = 0)),
)
df_results_obj_spar_orth = DataFrame(
    μ               = df_results_NCRPG.μ,
    n               = Int.(df_results_NCRPG.n),
    p               = Int.(df_results_NCRPG.p),
    NCRPG_obj       = df_results_NCRPG.objective,
    NCRPG_sparsity  = df_results_NCRPG.sparsity,
    NCRPG_orth      = df_results_NCRPG.orthogonality,
    NCRPG_bt_obj    = df_results_NCRPG_bt.objective,
    NCRPG_bt_sparsity = df_results_NCRPG_bt.sparsity,
    NCRPG_bt_orth   = df_results_NCRPG_bt.orthogonality,
    RPG_obj         = df_results_RPG.objective,
    RPG_sparsity    = df_results_RPG.sparsity,
    RPG_orth        = df_results_RPG.orthogonality,
)
# Write the results to CSV files
CSV.write(joinpath(results_folder, "results-OB-time-iter-$(m_tests).csv"), df_results_time_iter)
CSV.write(joinpath(results_folder, "results-OB-obj-spar-orth-$(m_tests).csv"), df_results_obj_spar_orth)

</details>

We can take a look at how the algorithms compare to each other in their performance with the following tables. First, we look at the time and number of iterations for each algorithm.

μnpNCRPGconsttimeNCRPGconstiterNCRPGbttimeNCRPGbtiterRPG_timeRPG_iter
0.110050.748053428740.57898162341.2599842874
0.120051.29457317950.71150634942.0203631814
0.130053.19935405321.536944.6500440535
0.510050.196853105880.09556479850.29951510590
0.520050.627545143980.28849412300.93094814407
0.530052.16516260570.63182812922.9476426080
1.010050.17878497050.18077114720.2629479732
1.020050.23772659030.2712666260.354525911
1.030050.03580254490.00785541270.0564122449

Second, we look at the objective values, sparsity, and orthogonality of the solutions found by each algorithm.

μnpNCRPGconstobjNCRPGconstsparNCRPGconstorthNCRPGbtobjNCRPGbtsparNCRPGbtorthRPG_objRPG_sparRPG_orth
0.110053.220580.47180.156323.220060.47520.1552143.220580.47180.156319
0.120054.395650.51770.1163744.395810.51710.1182284.395650.51770.116374
0.130055.241130.5522670.1014855.2420.55160.1003025.241130.5522670.101485
0.5100513.05950.73480.12316413.09920.73440.11737513.05950.73480.123164
0.5200516.88250.8130.073209916.86330.81170.077542716.88250.8130.0732099
0.5300519.1590.8721330.055983919.19610.8733330.059589819.1590.8721330.0559839
1.0100522.12090.87220.060249522.07760.87280.072247522.12090.87220.0602495
1.0200525.59640.97942.33716e-1625.61140.98240.042979425.59640.97942.43844e-16
1.0300524.74440.9966670.024.74570.9966670.024.74440.9966670.0

Technical details

This tutorial is cached. It was last run on the following package versions.

<details class="code-fold"> <summary>Code</summary>

using Pkg
Pkg.status()

</details>

Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.0
  [336ed68f] CSV v0.10.15
  [13f3f980] CairoMakie v0.15.6
  [0ca39b1e] Chairmarks v1.3.1
  [35d6a980] ColorSchemes v3.31.0
  [5ae59095] Colors v0.13.1
  [a93c6f00] DataFrames v1.8.0
  [31c24e10] Distributions v0.25.122
⌅ [682c06a0] JSON v0.21.4
  [8ac3fa9e] LRUCache v1.6.2
  [b964fa9f] LaTeXStrings v1.4.0
  [d3d80556] LineSearches v7.4.0
  [ee78f7c6] Makie v0.24.6
  [af67fdf4] ManifoldDiff v0.4.5
  [1cead3c2] Manifolds v0.11.0
  [3362f125] ManifoldsBase v2.0.0
  [0fc0a36d] Manopt v0.5.25
  [5b8d5e80] ManoptExamples v0.1.16 `..`
  [51fcb6bd] NamedColors v0.2.3
  [91a5bcdd] Plots v1.41.1
  [08abe8d2] PrettyTables v3.1.0
  [6099a3de] PythonCall v0.9.28
  [f468eda6] QuadraticModels v0.9.14
  [1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

This tutorial was last rendered October 15, 2025, 19:20:36.

Literature

[BJJP25b]
R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Nononvex Optimization, preprint (2025), arXiv:2506.09775.
[HW21]
W. Huang and K. Wei. Riemannian proximal gradient methods. Mathematical Programming 194, 371–413 (2021).