Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lognumslots.sh sometimes underestimates the required number of slots #32

Open
hmusta opened this issue Aug 15, 2018 · 2 comments
Open

Comments

@hmusta
Copy link

hmusta commented Aug 15, 2018

I've noticed that on a small number of read sets (e.g. SRR522088), lognumslots.sh underestimates the number of slots needed in the CQF for squeakr-exact

Here's my current workflow for gzipped fastq files

ntcard -k 20 -c 2 -t 10 -p $OUTPREFIX $INPUT
NUMSLOTS=$(lognumslots.sh $OUTPREFIX\_k20.hist)
squeakr-count -g -k 20 -s $NUMSLOTS -t 10 -o $OUTDIR/ $INPUT

In the case of SRR522088, the script computed 26 as the required number of slots, resulting in a segfault. When I set it to 27, it runs smoothly.

Since this script is only in the master branch, I was wondering if there's perhaps a version tuned for the exact branch that I may not be finding in the repo.

@prashantpandey
Copy link
Member

Hi @hmusta , in the current version of Squeakr, we have auto-resizing when running with a single thread. So, even if you underestimate the size there won't be a seg fault. Please try it and let me know if you still have any issues.

Thanks,
Prashant

@t-kranz
Copy link

t-kranz commented Mar 23, 2020

Hello,

i observed segfaults when using the value from lognumslots.sh as well, with the squeakr version from Oct 2019 (should be 5ad2ad6).

This seems to happen frequently for me on very small test datasets.

Reproducing this should be quite simple:

Create a file (called 1.fastq) containing:

@1_1/1
TATGCACCAGAGTATGGAAGCATAAGCTCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCAGTCAACAAAGCCGAGTGGGCGCAACGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Then run ntcard followed by lognumslots.sh:

ntcard -k32 1.fastq -p ntcard.out
lognumslots.sh ntcard.out_k32.hist

lognumslots returns 7, but the smallest value for which squeakr count doesn’t crash is 10.

squeakr count -n -e -k 32 -s 7 -o 1.squeakr 1.fastq

results in a seqfault, while

squeakr count -n -e -k 32 -s 10 -o 1.squeakr 1.fastq

works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants