Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent z-score calculation for CSC and CSR sparse matrices #288

Closed
jkanche opened this issue Feb 12, 2024 · 4 comments
Closed

Inconsistent z-score calculation for CSC and CSR sparse matrices #288

jkanche opened this issue Feb 12, 2024 · 4 comments

Comments

@jkanche
Copy link
Contributor

jkanche commented Feb 12, 2024

The z-score calculation in Pegasus produces inconsistent results based on the type of input matrix (CSC or CSR). The following code snippet demonstrates the issue:

import numpy as np
import pandas as pd
import anndata as ad
from scipy.sparse import csr_matrix

counts = csr_matrix(np.random.poisson(1, size=(100, 2000)), dtype=np.float32)

adata_csr = ad.AnnData(counts)
adata_csc = ad.AnnData(counts.tocsc())

import pegasus

print(pegasus.__version__)
# 1.9.0

np.allclose(pegasus.calculate_z_score(adata_csc), pegasus.calculate_z_score(adata_csr))
# 2024-02-12 14:41:34,223 - pegasus.tools.signature_score - WARNING - Detected and dropped duplicate bin edges!
# 2024-02-12 14:41:34,245 - pegasus.tools.signature_score - WARNING - Detected and dropped duplicate bin edges!
# False

Upon investigation, the root cause seems to be in this line. The code should check the orientation of the sparse matrix before computing mean and standard deviation. Coercing the matrix to CSR format might be necessary, If I understand the sparse functions here correctly.

Environment:

  • Pegasus version: 1.9.0
  • Python version: python 3.11
@yihming
Copy link
Member

yihming commented Feb 28, 2024

Hi @jkanche . PR #290 should fix this issue.

In brief, in both calc_mean and calc_sig_background functions that are called by calculate_z_score, the count matrix needs to be converted into csr_matrix if not.

@jkanche
Copy link
Contributor Author

jkanche commented Feb 28, 2024

I believe this effects most of your other functions like calc_mean. Especially those that usually accept indices, indptr and data and all upstream methods that call these functions.

@yihming
Copy link
Member

yihming commented Mar 1, 2024

Yes, I also see that. I may add this guard to them as well.

Specifically for your case, please let me know if the issue still persists.

@yihming
Copy link
Member

yihming commented Mar 16, 2024

The fix for this issue is released in version 1.9.1 (https://pegasus.readthedocs.io/en/stable/#march-16-2024). I'll close this issue, but feel free to reopen it if it persists.

@yihming yihming closed this as completed Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants