How to improve the efficiency on a single sample? #439

dudududu12138 · 2024-07-09T02:48:25Z

Hi, I want to use bambu(v3.5.1) on single sample while it seems that the parameter ncore is applied to improve the efficiency of multisamples. So I tested different combinations of parameters to improve the efficiency of bambu on a single sample. I use a small data(.bam, 70M). But I didn't find the right combination of parameters. So how should I set it up to increase the running speed? Currently, I have 5 cpu's set up, but bambu is only using 1 and the remaining 4 are all idle.

1. Test1: just set ncore=5

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,yieldSize=1e5)

The efficiency is listed below and it is very low(~10%)

2. Test2: Set lowMemory=TRUE and ncore=5

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,lowMemory=TRUE,yieldSize=1e5)

The efficiency was improved a little(~20%). But while I set lowMemory=TRUE ,it reported errors.

Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in if (annotatedIntronNumberNew > annotatedIntronNumber & !is.na(annotatedIntronNumber)) {: missing value where TRUE/FALSE needed
In addition: There were 22 warnings (use warnings() to see them)
Execution halted

3. Test3: set ncore=5 and yieldSize=1e6(improve from 1e5 to 1e9)

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,yieldSize=1e9)

A little bit higher efficiency than Test1, so should I improve the value of yieldSize ?

The text was updated successfully, but these errors were encountered:

andredsim · 2024-07-09T06:12:04Z

Hi,

You are correct that ncore mainly speeds up the performance when running multiple samples as we distribute each sample to a different core for read class construction and quantification. For single samples improving yieldSize will speed up performance as that increases the amount of reads processed at a time from the bam file, this will come as a trade off to memory usage so be careful there.
Theoretically it would be possible for us to distribute read class construction and quantification of a single sample across multiple cores and it is something we should look at implementing as a optional parameter in the future however it is sadly not yet implemented.

Hope this answers your questions.

dudududu12138 · 2024-07-10T02:56:42Z

Hi,

You are correct that ncore mainly speeds up the performance when running multiple samples as we distribute each sample to a different core for read class construction and quantification. For single samples improving yieldSize will speed up performance as that increases the amount of reads processed at a time from the bam file, this will come as a trade off to memory usage so be careful there. Theoretically it would be possible for us to distribute read class construction and quantification of a single sample across multiple cores and it is something we should look at implementing as a optional parameter in the future however it is sadly not yet implemented.

Hope this answers your questions.

Thanks for your reply. By the way, do you know the reason of the error at Test2? I think it is caused by set lowMemory=TRUE, but I don't know why.

andredsim · 2024-07-22T02:38:04Z

Hi,
Sorry for the late reply here. I am not 100% certain why this is happening. My guess is that when lowMemory is on, it might be trying to generate read classes on one of the minor scaffolds that might have only a few reads. This would be something we would like to catch, but I will need to try and replicate this error on myside. If you want to test this, you could remove reads from all the scaffolds which have less than 1000 reads and try again.
Kind Regards,
Andre Sim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve the efficiency on a single sample? #439

How to improve the efficiency on a single sample? #439

dudududu12138 commented Jul 9, 2024 •

edited

Loading

andredsim commented Jul 9, 2024

dudududu12138 commented Jul 10, 2024

andredsim commented Jul 22, 2024

How to improve the efficiency on a single sample? #439

How to improve the efficiency on a single sample? #439

Comments

dudududu12138 commented Jul 9, 2024 • edited Loading

1. Test1: just set ncore=5

2. Test2: Set lowMemory=TRUE and ncore=5

3. Test3: set ncore=5 and yieldSize=1e6(improve from 1e5 to 1e9)

andredsim commented Jul 9, 2024

dudududu12138 commented Jul 10, 2024

andredsim commented Jul 22, 2024

dudududu12138 commented Jul 9, 2024 •

edited

Loading