Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve the efficiency on a single sample? #439

Open
dudududu12138 opened this issue Jul 9, 2024 · 3 comments
Open

How to improve the efficiency on a single sample? #439

dudududu12138 opened this issue Jul 9, 2024 · 3 comments

Comments

@dudududu12138
Copy link

dudududu12138 commented Jul 9, 2024

Hi, I want to use bambu(v3.5.1) on single sample while it seems that the parameter ncore is applied to improve the efficiency of multisamples. So I tested different combinations of parameters to improve the efficiency of bambu on a single sample. I use a small data(.bam, 70M). But I didn't find the right combination of parameters. So how should I set it up to increase the running speed? Currently, I have 5 cpu's set up, but bambu is only using 1 and the remaining 4 are all idle.

1. Test1: just set ncore=5

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,yieldSize=1e5)

The efficiency is listed below and it is very low(~10%)
1720492714374(1)

2. Test2: Set lowMemory=TRUE and ncore=5

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,lowMemory=TRUE,yieldSize=1e5)

The efficiency was improved a little(~20%). But while I set lowMemory=TRUE ,it reported errors.
1720492862892

Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in if (annotatedIntronNumberNew > annotatedIntronNumber & !is.na(annotatedIntronNumber)) {: missing value where TRUE/FALSE needed
In addition: There were 22 warnings (use warnings() to see them)
Execution halted

3. Test3: set ncore=5 and yieldSize=1e6(improve from 1e5 to 1e9)

se<-bambu(reads='test.bam', annotations=bambuAnnotations, genome=ref, ncore=5,quant=TRUE,discovery=TRUE,trackReads=TRUE,yieldSize=1e9)

A little bit higher efficiency than Test1, so should I improve the value of yieldSize ?
1720494312774

@andredsim
Copy link
Collaborator

Hi,

You are correct that ncore mainly speeds up the performance when running multiple samples as we distribute each sample to a different core for read class construction and quantification. For single samples improving yieldSize will speed up performance as that increases the amount of reads processed at a time from the bam file, this will come as a trade off to memory usage so be careful there.
Theoretically it would be possible for us to distribute read class construction and quantification of a single sample across multiple cores and it is something we should look at implementing as a optional parameter in the future however it is sadly not yet implemented.

Hope this answers your questions.

@dudududu12138
Copy link
Author

Hi,

You are correct that ncore mainly speeds up the performance when running multiple samples as we distribute each sample to a different core for read class construction and quantification. For single samples improving yieldSize will speed up performance as that increases the amount of reads processed at a time from the bam file, this will come as a trade off to memory usage so be careful there. Theoretically it would be possible for us to distribute read class construction and quantification of a single sample across multiple cores and it is something we should look at implementing as a optional parameter in the future however it is sadly not yet implemented.

Hope this answers your questions.

Thanks for your reply. By the way, do you know the reason of the error at Test2? I think it is caused by set lowMemory=TRUE, but I don't know why.

@andredsim
Copy link
Collaborator

Hi,
Sorry for the late reply here. I am not 100% certain why this is happening. My guess is that when lowMemory is on, it might be trying to generate read classes on one of the minor scaffolds that might have only a few reads. This would be something we would like to catch, but I will need to try and replicate this error on myside. If you want to test this, you could remove reads from all the scaffolds which have less than 1000 reads and try again.
Kind Regards,
Andre Sim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants