Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty .dict files #7

Open
OliverPStuart opened this issue Feb 7, 2021 · 0 comments
Open

Empty .dict files #7

OliverPStuart opened this issue Feb 7, 2021 · 0 comments

Comments

@OliverPStuart
Copy link

I've been trying to run vargeno on non-human data and running into problems at the indexing stage. No error is reported during the process, but the .dict files are both empty, and so the genotyping step fails.

I'm working with a fragmentary reference assembly of a grasshopper genome, so both the bioinformatic and biological properties of the data are not at all what vargeno was designed for.

Do you have any tips for troubleshooting? Attached (here) is a sample of the .vcf input. Since my data is not human data and I'm obviously not working with dbSNP it's a little unclear how to properly format this file. Variants were detected with freebayes in the first instance.

Here is the terminal output:

$ vargeno index packardii.sub.fa snp.vcf test
[BloomFilter constructBfFromGenomeseq] bit vector: 755356701/9600000000
[BloomFilter constructBfFromGenomeseq] lite bit vector: 988176227/18400000000
[BloomFilter constructBfFromVCF] bit vector: 0/1120000000
SNP Dictionary
Total k-mers:        21626752
Unambig k-mers:      20575340
Ambig unique k-mers: 296062
Ambig total k-mers:  1051412
Ref Dictionary
Total k-mers:        1305711431
Unambig k-mers:      1130124620
Ambig unique k-mers: 36489256
Ambig total k-mers:  175586811

And here are the output files:

-rw-r--r--  1 oliver users   12348187 Feb  5 11:42 test.chrlens
-rw-r--r--  1 oliver users 1200000008 Feb  5 10:43 test.ref.bf
-rw-r--r--  1 oliver users 2300000008 Feb  5 10:43 test.ref.bf.lite.bf
-rw-r--r--  1 oliver users          0 Feb  5 14:47 test.ref.dict
-rw-r--r--  1 oliver users  140000008 Feb  5 11:41 test.snp.bf
-rw-r--r--  1 oliver users          0 Feb  5 11:42 test.snp.dict

All of the test files (in /vargeno/test) run fine and reproduce the provided output files. I'm running on Ubuntu 18.04.5 in a conda environment with the following packages:

# packages in environment at /home/oliver/miniconda2/envs/vargeno:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
bioawk                    1.0                  hed695b0_5    bioconda
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
seqtk                     1.3                  hed695b0_2    bioconda
vargeno                   1.0.3                hc9558a2_1    bioconda
zlib                      1.2.11            h516909a_1010    conda-forge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant