-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run time estimation #9
Comments
Hi Pete, The bottleneck is still there for large data set. In your case, it is probably caused by the large number of cell barcodes. Normally, it runs within one or two days for ~10k cells. In your case, 31k cells may increase the running time. Also, it is linearly sensitive to the candidate SNP size (i.e., For speeding up, maybe you could split the candidate SNPs (e.g., by chromosome or random) and run it in multiple nodes if it runs on cluster. For estimating the running time, you could read the log file, which shows how many SNPs have been processed. Yuanhua |
Thanks, Yuanhua. I've started looking into providing a much-reduced set of candidate SNPs. |
Thank you for this great tool. |
Similar to #3 but wondering if things have changed.
Running cellSNP v0.1.7 as
with 31,707 barcodes on a 25G BAM file has been going for > 18 days!
It's still writing output, too (as of 2020-02-03 5PM):
% ll -t data/cellSNP/cellSNP.cells.vcf.gz.temp_* -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_17_ -rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_3_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_11_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_15_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:01 data/cellSNP/cellSNP.cells.vcf.gz.temp_16_ -rw-r----- 1 hickey grpu_mritchie_1 1.1G Feb 3 16:59 data/cellSNP/cellSNP.cells.vcf.gz.temp_19_ -rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 16:56 data/cellSNP/cellSNP.cells.vcf.gz.temp_12_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:55 data/cellSNP/cellSNP.cells.vcf.gz.temp_8_ -rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_6_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_9_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:51 data/cellSNP/cellSNP.cells.vcf.gz.temp_10_ -rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 16:50 data/cellSNP/cellSNP.cells.vcf.gz.temp_1_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:46 data/cellSNP/cellSNP.cells.vcf.gz.temp_2_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:38 data/cellSNP/cellSNP.cells.vcf.gz.temp_14_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:37 data/cellSNP/cellSNP.cells.vcf.gz.temp_13_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:32 data/cellSNP/cellSNP.cells.vcf.gz.temp_4_ -rw-r----- 1 hickey grpu_mritchie_1 2.5G Feb 3 16:19 data/cellSNP/cellSNP.cells.vcf.gz.temp_7_ -rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 15:04 data/cellSNP/cellSNP.cells.vcf.gz.temp_18_ -rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 13:44 data/cellSNP/cellSNP.cells.vcf.gz.temp_0_ -rw-r----- 1 hickey grpu_mritchie_1 1.8G Feb 2 23:52 data/cellSNP/cellSNP.cells.vcf.gz.temp_5_
I've run cellSNP before and although it took a few days it certainly didn't take this long.
I'm wondering:
--regionVCF
, ...) might be causing this huge runtime?Thanks,
Pete
The text was updated successfully, but these errors were encountered: