Row-sparse Low-rank Matrix Recovery

Paula John, Hajg Jasa 2025-10-01

Introduction

In this example we use the Nonconvex Riemannian Proximal Gradient (NCRPG) method [BJJP25b] and compare it to the Riemannian Alternating Direction Method of Multipliers (RADMM) [LMS22]. This example reproduces the results from [BJJP25b], Section 6.3. The numbers may vary slightly due to having run this notebook on a different machine.

using PrettyTables
using BenchmarkTools
using CSV, DataFrames
using ColorSchemes, Plots, LaTeXStrings
using Random, LinearAlgebra, LRUCache, Distributions
using ManifoldDiff, Manifolds, Manopt, ManoptExamples

The Problem

Let $\mathcal M = \mathcal M_r$ be the manifold of rank $r$ matrices. Let $g \colon \mathcal M \to \mathbb R$ be defined by

\[g(X) = \Vert\mathbb A (X) - y\Vert_2^2,\]

where $\mathbb A \colon \mathbb R^{M \times N} \to \mathbb R^m$ is a linear measurement operator.

Let $h \colon \mathcal M \to \mathbb R$ be defined by

\[h(X) = \mu \Vert X \Vert_{1, 2}\]

be the row sparsity-enforcing term given by the $\ell_{1,2}$-norm, where $\mu \ge 0$ is a regularization parameter.

We define our total objective function as $f = g + h$. The goal is to recover the (low-rank and row-sparse) signal $X$ from as few measurements $y$ as possible.

Numerical Experiment

We initialize the experiment parameters, as well as some utility functions.

# Set random seed for reproducibility
random_seed = 1520
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 180.0

atol = 1e-7
max_iters = 5000
c = 1e-4    # penalty parameter 
M = 500     # amount of rows
N = 100     # amount of columns
s = 10      # amount of non-zero rows
r_m_array = [(1, 300), (2, 500), (3, 700)] # (rank, number of measurements)
step_size = 0.25
init_step_size_bt = 2 * step_size
stop_NCRPG = atol
stop_RADMM = stop_NCRPG * step_size 
function mean_error_zero_rows(Mr, X, zero_rows)
    M, N, r = Manifolds.get_parameter(Mr.size)
    err = 0.0
    for j in zero_rows
        err += norm(X.U[j, :] .* X.S)
    end 
    return err/length(zero_rows)
end

We define a function to generate the test data for the Sparse PCA problem.

function generate_data(M, N, r, m, s)
    # Generate rank r matrix with s non-zero rows
    X = rand(M, r) * transpose(Matrix(qr(rand(N, r)).Q[:, 1:r]))
    smpl = sample(1:M, M - s , replace = false)
    X[smpl, :] .= 0.0

    # Normalize
    X = X/norm(X)

    # Generate measurement operator A and signal y
    A = rand(Normal(0, 1/sqrt(m)), m, M * N)
    y = A * vec(X)   
    return X, A, y, smpl
end

We implement the proximal operators for the $\ell_{1, 2}$-norm on the fixed-rank manifold, following [BJJP25b] and [LMS22].

# NCRPG 
function prox_norm12_global(Mr::FixedRankMatrices, prox_param, X::SVDMPoint; c = c) 
    λ = prox_param * c
    M, N, k = Manifolds.get_parameter(Mr.size)
    YU = zeros(M, k)
    for i in 1:M
        normx1_i = norm(X.U[i, :] .* X.S)
        if  normx1_i > λ
            YU[i, :] = ((normx1_i - λ)/normx1_i) * X.U[i, :]
        end
    end
    Z = SVDMPoint(YU * diagm(X.S))
    return SVDMPoint(Z.U, Z.S, Z.Vt * X.Vt)
end
#
#RADMM 
function prox_norm12_global(Mr::FixedRankMatrices, prox_param, X1::Matrix{Float64}; c = c) 
    λ = prox_param * c
    M, N, k = Manifolds.get_parameter(Mr.size)
    Y1 = zeros(M, N)
    for i in 1:M
        normx1_i = norm(X1[i, :])
        if  normx1_i > λ
            Y1[i, :] = ((normx1_i - λ)/normx1_i) * X1[i, :]
        end
    end
    return Y1
end

Next, we define the objective function, its gradient, and the proximal operator for the $\ell_{1,2}$-norm on the fixed-rank manifold.

# Objective(s), gradient, and proxes
function norm12(X::SVDMPoint)
    M = size(X.U)[1]
    sum([norm(X.U[i, :].*X.S) for i = 1:M])
end
function f_global(M::FixedRankMatrices, X::SVDMPoint, A, c, y, max_cost = 1e5) 
    X_vec = vec(embed(M, X))
    cost =  0.5 * (norm(A*X_vec - y))^2 + c * norm12(X)
    if cost >= max_cost
        return NaN
    else 
        return cost
    end 
end 
#
g_global(M::FixedRankMatrices, X::SVDMPoint, A, c, y) = 0.5*(norm(A*vec(embed(M, X)) - y))^2 
function grad_g_global(M::FixedRankMatrices, X::SVDMPoint, A, c, y) 
    X_mat = embed(M, X)
    return project(M, X, reshape(A'*(A*vec(X_mat) - y), size(X_mat)))
end 
h_global(M::FixedRankMatrices, X::SVDMPoint, A, c, y) = c * norm12(X)

We introduce an implementation of the RADMM method for the Row-sparse Low-rank Matrix Recovery problem on the oblique manifold, following [LMS22].

"""
\argmin F(X) = 0.5||AX-y||^2 + c *||X||_{1,2}
f(X) = 0.5||AX - y||^2
g(X) = c * ||X||_{1,2}
L_{ρ,γ}(X, Z, Λ) = f(X) + g_γ(Z) + <Λ, X - Z> + ρ/2 * ||X - Z||^2
grad_X L_{ρ,γ}(X, Z, Λ) = project(A'(AX - y) + Λ + ρ(X - Z)) 

"""
function RADMM_blind_deconv(
    A,
    M, # rows
    N, # columns
    rank,
    c,  # penalty parameter 
    y, # signal
    prox_l12;
    η = 1e-1,
    ρ = 0.1,
    γ = 1e-7,
    stop = 1e-8, 
    max_iters  = 100,
    start = 0,
    record = false, 
    ϵ_spars = 1e-5,
    max_cost = 1e5
) 
    flag_succ = false
    Mr = FixedRankMatrices(M, N, rank)
    γρ = γ * ρ
    F(X) = 0.5 * norm(A*vec(X) - y)^2 + c * sum([norm(X[i, :]) for i=1:M])
    grad_augLagr(X::SVDMPoint, X_mat::Matrix, Λ::Matrix, Z::Matrix) = project(Mr, X, reshape(A'*(A*vec(X_mat) - y) + vec(Λ + ρ*(X_mat - Z)), (M, N)))

    if start == 0
        X = rand(Mr)
    else 
        X = start
    end 
    X_mat = embed(Mr, X)
    Z = X_mat
    Λ = zeros(M, N)
    it = -1 
    if !record 
        for i in 1:max_iters 
            descent_dir = -η * grad_augLagr(X, X_mat, Λ, Z)
            X = retract(Mr, X, descent_dir) 
            X_mat = embed(Mr, X)
            Y = prox_l12(Mr, (1 + γρ)/ρ , X_mat + 1/ρ * Λ; c = c)
            Z = (1/γ * Y + Λ + ρ * X_mat)/ (1/γ + ρ)
            Λ = Λ + ρ * (X_mat - Z)
            if (norm(embed(Mr, X, descent_dir)) < stop)
                flag_succ = true 
                it = i 
                break 
            end 
        end
        if it == -1
            it = max_iters 
        end 
        return X, flag_succ, it 
    else 
        Iterates = []
        for i in 1:max_iters 
            descent_dir = -η * grad_augLagr(X, X_mat, Λ, Z)
            X = retract(Mr, X, descent_dir) 
            push!(Iterates, X)
            X_mat = embed(Mr, X)
            Y = prox_l12(Mr, (1 + γρ)/ρ , X_mat + 1/ρ * Λ; c = c)
            Z = (1/γ * Y + Λ + ρ * X_mat)/ (1/γ + ρ)
            Λ = Λ + ρ * (X_mat - Z)
            if (norm(embed(Mr, X, descent_dir)) < stop)
                flag_succ = true 
                it = i 
                break 
            end 
            # if i%100 == 0
            #     println(i, "\t", norm(embed(Mr, X, descent_dir)))
            # end 
        end
        if it == -1 
            it = max_iters 
        end 
        return X, flag_succ, it, Iterates
    end 
end 

We set up some variables to collect the results of the experiments and initialize the dataframes

And run the experiments

for (r, m) in r_m_array
    # Set random seed for reproducibility
    Random.seed!(random_seed)
    #
    # Define manifold
    Mr = FixedRankMatrices(M, N, r) #fixed rank manifold
    #
    # Generate rank r matrix with s non-zero rows
    Sol_mat, A, y, zero_rows = generate_data(M, N, r, m, s)
    Sol = SVDMPoint(Sol_mat, r)
    # Local starting point 
    Y = rand(Normal(0, 0.01/sqrt(r)), M, N)
    start = SVDMPoint(Sol_mat + Y, r)
    dist_start_sol = distance(Mr, Sol, start, OrthographicInverseRetraction())
    # Localize objectives
    f(Mr, X) = f_global(Mr, X, A, c, y)
    g(Mr, X) = g_global(Mr, X, A, c, y)
    grad_g(Mr, X) = grad_g_global(Mr, X, A, c, y)
    h(Mr, X) = h_global(Mr, X, A, c, y)
    prox_norm12(Mr, prox_param, X) = prox_norm12_global(Mr, prox_param, X; c = c)
    #
    # Optimization
    # NCRPG
    rec_NCRPG = proximal_gradient_method(Mr, f, g, grad_g, start; 
        prox_nonsmooth=prox_norm12,
        retraction_method=OrthographicRetraction(),
        inverse_retraction_method=OrthographicInverseRetraction(),
        stepsize = ConstantLength(step_size),
        record=[:Iteration, :Iterate],
        return_state=true,
        debug=[ 
            :Iteration,( "|Δp|: %1.9f |"),
            DebugChange(; inverse_retraction_method= OrthographicInverseRetraction()),
            (:Cost, " F(x): %1.11f | "), 
            "\n", 
            :Stop, 
            100
        ],
        stopping_criterion =  StopAfterIteration(max_iters )|StopWhenGradientMappingNormLess(stop_NCRPG)
    )
    bm_NCRPG = @benchmark proximal_gradient_method(``Mr, ``f, ``g, ``grad_g, ``start; 
        prox_nonsmooth=``prox_norm12,
        retraction_method=OrthographicRetraction(),
        inverse_retraction_method=OrthographicInverseRetraction(),
        stepsize = ConstantLength(``step_size),
        stopping_criterion = StopAfterIteration(``max_iters )|StopWhenGradientMappingNormLess(``stop_NCRPG)
    )
    it_NCRPG, res_NCRPG = get_record(rec_NCRPG)[end]
    # NCRPG with backtracking
    rec_NCRPG_bt = proximal_gradient_method(Mr, f, g, grad_g, start; 
        prox_nonsmooth=prox_norm12,
        retraction_method=OrthographicRetraction(),
        inverse_retraction_method=OrthographicInverseRetraction(),
        stepsize = ProximalGradientMethodBacktracking(;             
            strategy=:nonconvex, 
            initial_stepsize=init_step_size_bt
        ),
        record=[:Iteration, :Iterate],
        return_state=true,
        debug=[ 
            :Iteration,( "|Δp|: %1.9f |"),
            DebugChange(; inverse_retraction_method=OrthographicInverseRetraction()),
            (:Cost, " F(x): %1.11f | "), 
            "\n", 
            :Stop, 
            100
        ],
        stopping_criterion = StopAfterIteration(max_iters )|    StopWhenGradientMappingNormLess(stop_NCRPG)
    )
    bm_NCRPG_bt = @benchmark proximal_gradient_method(``Mr, ``f, ``g, ``grad_g, ``start; 
        prox_nonsmooth=``prox_norm12,
        retraction_method=OrthographicRetraction(),
        inverse_retraction_method=OrthographicInverseRetraction(),
        stepsize = ProximalGradientMethodBacktracking(; strategy=:nonconvex, initial_stepsize=``init_step_size_bt),
        stopping_criterion = StopAfterIteration(``max_iters )|  StopWhenGradientMappingNormLess(``stop_NCRPG)
    )
    it_NCRPG_bt, res_NCRPG_bt = get_record(rec_NCRPG_bt)[end]
    # RADMM
    res_RADMM, succ, it_RADMM = RADMM_blind_deconv(A, M, N, r, c, y, prox_norm12_global; 
        max_iters  = max_iters , 
        start = start, 
        η = step_size,  
        stop = stop_RADMM
    )  
    bm_RADMM = @benchmark RADMM_blind_deconv(``A, ``M, ``N, ``r, ``c, ``y, ``prox_norm12_global; 
        max_iters  = max_iters , 
        start = ``start, 
        η = ``step_size,  
        stop = stop_RADMM
    )  
    #
    # Collect results
    # Distances between the results
    dist_NCRPG_RADMM = distance(Mr, res_NCRPG, res_RADMM, OrthographicInverseRetraction())
    dist_NCRPG_bt_RADMM = distance(Mr, res_NCRPG_bt, res_RADMM, OrthographicInverseRetraction())
    dist_NCRPG_NCRPG_bt = distance(Mr, res_NCRPG, res_NCRPG_bt, OrthographicInverseRetraction())
    # Times
    time_RADMM    = median(bm_RADMM   ).time/1e9
    time_NCRPG    = median(bm_NCRPG   ).time/1e9
    time_NCRPG_bt = median(bm_NCRPG_bt).time/1e9
    # Errors
    error_RADMM    = distance(Mr, Sol, res_RADMM,    OrthographicInverseRetraction())
    error_NCRPG    = distance(Mr, Sol, res_NCRPG,    OrthographicInverseRetraction())
    error_NCRPG_bt = distance(Mr, Sol, res_NCRPG_bt, OrthographicInverseRetraction())
    # Mean zero row errors
    mean_zero_row_error_NCRPG    = mean_error_zero_rows(Mr, res_NCRPG, zero_rows    )
    mean_zero_row_error_NCRPG_bt = mean_error_zero_rows(Mr, res_NCRPG_bt, zero_rows )
    mean_zero_row_error_RADMM    = mean_error_zero_rows(Mr, res_RADMM, zero_rows    )
    #
    # Push results to dataframes
    push!(df_RADMM, 
        [
            M, N, m, r, s,
            step_size,
            time_RADMM, 
            error_RADMM,
            it_RADMM,
            mean_zero_row_error_RADMM
        ]
    )
    push!(df_NCRPG, 
        [
            M, N, m, r, s,
            step_size,
            time_NCRPG, 
            error_NCRPG,
            it_NCRPG,
            mean_zero_row_error_NCRPG
        ]
    )
    push!(df_NCRPG_bt, 
        [
            M, N, m, r, s,
            init_step_size_bt,
            time_NCRPG_bt, 
            error_NCRPG_bt,
            it_NCRPG_bt,
            mean_zero_row_error_NCRPG_bt
        ]
    )
    push!(df_distances, 
        [
            M, N, m, r, s,
            dist_NCRPG_NCRPG_bt,
            dist_NCRPG_RADMM,
            dist_NCRPG_bt_RADMM
        ]
    )
end

We export the results to CSV files

CSV.write(joinpath(results_folder, "results-fixed-rank-RADMM.csv"), df_RADMM)
CSV.write(joinpath(results_folder, "results-fixed-rank-NCRPG.csv"), df_NCRPG)
CSV.write(joinpath(results_folder, "results-fixed-rank-NCRPG-bt.csv"), df_NCRPG_bt)
CSV.write(joinpath(results_folder, "results-fixed-rank-distances.csv"), df_distances )

We can take a look at how the algorithms compare to each other in their performance with the following tables. The first table shows the performance RADMM.

MNmrsstepsizetime (s)erroriterationsmean_zero_row_error
500.0100.0300.01.010.00.2516.66440.000528121431.05.37441e-9
500.0100.0500.02.010.00.2519.19710.0007254311354.06.09862e-9
500.0100.0700.03.010.00.2532.71790.0007728051414.07.4266e-9

The next table shows the performance of NCRPG with a constant stepsize.

MNmrsstepsizetime (s)erroriterationsmean_zero_row_error
500.0100.0300.01.010.00.2510.4750.0005281451049.00.0
500.0100.0500.02.010.00.2520.42980.0007258591047.01.15293e-20
500.0100.0700.03.010.00.2522.66980.0007751271120.04.22854e-20

The next table shows the performance of NCRPG with a backtracked stepsize. In this case, the column "stepsize" indicates the initial stepsize for the backtracking procedure.

MNmrsstepsizetime (s)erroriterationsmean_zero_row_error
500.0100.0300.01.010.00.516.26410.000528144562.00.0
500.0100.0500.02.010.00.531.19540.000725847604.01.86904e-20
500.0100.0700.03.010.00.53643.830.0007787085000.06.3238e-20

Second, we look at the distances of the solutions found by each algorithm.

MNmrsdist_NCRPG_NCRPG_btdist_NCRPG_RADMMdist_NCRPG_NCRPG_bt
500.0100.0300.01.010.01.08617e-85.59207e-75.49924e-7
500.0100.0500.02.010.02.18404e-87.53362e-77.33125e-7
500.0100.0700.03.010.01.38488e-52.39003e-52.51751e-5

Technical details

This tutorial is cached. It was last run on the following package versions.

using Pkg
Pkg.status()
Status `~/Repositories/Julia/ManoptExamples.jl/examples/Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.0
  [336ed68f] CSV v0.10.15
  [13f3f980] CairoMakie v0.15.6
  [0ca39b1e] Chairmarks v1.3.1
  [35d6a980] ColorSchemes v3.31.0
⌅ [5ae59095] Colors v0.12.11
  [a93c6f00] DataFrames v1.8.0
  [31c24e10] Distributions v0.25.120
  [7073ff75] IJulia v1.30.4
  [682c06a0] JSON v0.21.4
  [8ac3fa9e] LRUCache v1.6.2
  [b964fa9f] LaTeXStrings v1.4.0
  [d3d80556] LineSearches v7.4.0
  [ee78f7c6] Makie v0.24.6
  [af67fdf4] ManifoldDiff v0.4.4
  [1cead3c2] Manifolds v0.10.23
  [3362f125] ManifoldsBase v1.2.0
  [0fc0a36d] Manopt v0.5.23 `../../Manopt.jl`
  [5b8d5e80] ManoptExamples v0.1.15 `..`
  [51fcb6bd] NamedColors v0.2.3
  [91a5bcdd] Plots v1.41.1
  [08abe8d2] PrettyTables v3.0.11
  [6099a3de] PythonCall v0.9.28
  [f468eda6] QuadraticModels v0.9.14
  [1e40b3f8] RipQP v0.7.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

This tutorial was last rendered October 3, 2025, 15:22:53.

Literature

[BJJP25b]
R. Bergmann, H. Jasa, P. J. John and M. Pfeffer. The Intrinsic Riemannian Proximal Gradient Method for Nononvex Optimization, preprint (2025), arXiv:2506.09775.
[HW21]
W. Huang and K. Wei. Riemannian proximal gradient methods. Mathematical Programming 194, 371–413 (2021).