Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cytotrace2() function takes a very long time to run on 48k cells #44

Open
shaln opened this issue Nov 23, 2024 · 3 comments
Open

cytotrace2() function takes a very long time to run on 48k cells #44

shaln opened this issue Nov 23, 2024 · 3 comments

Comments

@shaln
Copy link

shaln commented Nov 23, 2024

Hi,

I tested cytotrace2 earlier this year on a dataset and I could run all functions with no issues. I recently revisited cytotrace2 after optimising other steps in our workflow/clustering parameters etc, but I started running into the following issue even though it is the same dataset. For reference, I'm running the following on a HPC cluster.

> all.samples <- readRDS("RDS Files/test/allsamples_clusters.RDS")

> # Extract annotations
> condition <- [email protected]$condition
> sample <- [email protected]$orig.ident
> clusters <- [email protected]$seurat_clusters
> cluster.cond <- [email protected]$cluster.cond
> cluster.sample <- [email protected]$cluster.sample

> cytotrace2_result <- cytotrace2(all.samples, is_seurat = TRUE, slot_type = "counts", species = "human", seed = 123)

cytotrace2: Started loading data
Dataset contains 38606 genes and 48366 cells.
cytotrace2: Running on 5 subsample(s) approximately of length 10000
cytotrace2: Started running on subsample(s). This will take a few minutes.
cytotrace2: Started preprocessing.
The function expects an input of type 'data.frame' or 'data.table'.
Attempting to convert the provided input to the required format.
13969 input genes mapped to model genes.
cytotrace2: Started prediction.
This section will run using  5 / 64 core(s).

It would get stuck at the above overnight and never went past that step. I thought it might be a memory issue so I tried doing the following but still had the same issue:

  1. Increasing memory from 4 slots of CPU core, 32GB per slot, to 10 slots of CPU core, 32GB each.
  2. Keeping only the variable features, so 2000 genes and 48366 cells.
  3. Extracting the expression data from the Seurat object so the CytoTRACE2 input is that, rather than a Seurat object.
> expression_data <- as.data.frame(all.samples[["RNA"]]$counts)
> cytotrace2_result <- cytotrace2(expression_data, species = "human", seed = 123)

Any idea if this is still a memory issue? How much memory would a dataset of this size usually require? Any help is much appreciated! :)

@getmeoutofthisloop
Copy link

I have met the same problem:(

@savagyan00
Copy link
Contributor

Hello,

I am sorry to hear you are experiencing issues running the tool on an HPC cluster. Depending on cluster resource allocation strategies, parallelization and multithreading of operations might be the reason that slows down the tool or even makes it stuck. Could you please try to disable parallelization by setting parallelize_models = FALSE, parallelize_smoothing = FALSE and rerun? This might take a little longer to run but if the problem is in fact in multithreading it should run through without issues.

Please let us know if you're able to rerun with the suggested parameters and if it works for you.

@savagyan00
Copy link
Contributor

Hello @shaln,

I am just following up to see if the suggestion worked and if you could run the tool. We also have a new release of CytoTRACE 2 with enhanced performance (v1.1.0) which you can try out if interested.

Let us know if we can help with anything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants