cytotrace2() function takes a very long time to run on 48k cells #44

shaln · 2024-11-23T15:44:56Z

Hi,

I tested cytotrace2 earlier this year on a dataset and I could run all functions with no issues. I recently revisited cytotrace2 after optimising other steps in our workflow/clustering parameters etc, but I started running into the following issue even though it is the same dataset. For reference, I'm running the following on a HPC cluster.

> all.samples <- readRDS("RDS Files/test/allsamples_clusters.RDS")

> # Extract annotations
> condition <- [email protected]$condition
> sample <- [email protected]$orig.ident
> clusters <- [email protected]$seurat_clusters
> cluster.cond <- [email protected]$cluster.cond
> cluster.sample <- [email protected]$cluster.sample

> cytotrace2_result <- cytotrace2(all.samples, is_seurat = TRUE, slot_type = "counts", species = "human", seed = 123)

cytotrace2: Started loading data
Dataset contains 38606 genes and 48366 cells.
cytotrace2: Running on 5 subsample(s) approximately of length 10000
cytotrace2: Started running on subsample(s). This will take a few minutes.
cytotrace2: Started preprocessing.
The function expects an input of type 'data.frame' or 'data.table'.
Attempting to convert the provided input to the required format.
13969 input genes mapped to model genes.
cytotrace2: Started prediction.
This section will run using  5 / 64 core(s).

It would get stuck at the above overnight and never went past that step. I thought it might be a memory issue so I tried doing the following but still had the same issue:

Increasing memory from 4 slots of CPU core, 32GB per slot, to 10 slots of CPU core, 32GB each.
Keeping only the variable features, so 2000 genes and 48366 cells.
Extracting the expression data from the Seurat object so the CytoTRACE2 input is that, rather than a Seurat object.

> expression_data <- as.data.frame(all.samples[["RNA"]]$counts)
> cytotrace2_result <- cytotrace2(expression_data, species = "human", seed = 123)

Any idea if this is still a memory issue? How much memory would a dataset of this size usually require? Any help is much appreciated! :)

The text was updated successfully, but these errors were encountered:

getmeoutofthisloop · 2024-11-29T16:03:30Z

I have met the same problem:(

savagyan00 · 2024-12-05T04:04:37Z

Hello,

I am sorry to hear you are experiencing issues running the tool on an HPC cluster. Depending on cluster resource allocation strategies, parallelization and multithreading of operations might be the reason that slows down the tool or even makes it stuck. Could you please try to disable parallelization by setting parallelize_models = FALSE, parallelize_smoothing = FALSE and rerun? This might take a little longer to run but if the problem is in fact in multithreading it should run through without issues.

Please let us know if you're able to rerun with the suggested parameters and if it works for you.

savagyan00 · 2025-01-13T17:41:15Z

Hello @shaln,

I am just following up to see if the suggestion worked and if you could run the tool. We also have a new release of CytoTRACE 2 with enhanced performance (v1.1.0) which you can try out if interested.

Let us know if we can help with anything!

savagyan00 closed this as completed Jan 13, 2025

savagyan00 reopened this Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cytotrace2() function takes a very long time to run on 48k cells #44

cytotrace2() function takes a very long time to run on 48k cells #44

shaln commented Nov 23, 2024

getmeoutofthisloop commented Nov 29, 2024

savagyan00 commented Dec 5, 2024

savagyan00 commented Jan 13, 2025

cytotrace2() function takes a very long time to run on 48k cells #44

cytotrace2() function takes a very long time to run on 48k cells #44

Comments

shaln commented Nov 23, 2024

getmeoutofthisloop commented Nov 29, 2024

savagyan00 commented Dec 5, 2024

savagyan00 commented Jan 13, 2025