Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate support for running SpliceAi #786

Open
lynnpais opened this issue May 14, 2024 · 1 comment
Open

Investigate support for running SpliceAi #786

lynnpais opened this issue May 14, 2024 · 1 comment
Assignees

Comments

@lynnpais
Copy link
Collaborator

Package available to run with tensor flow.

@bpblanken
Copy link
Collaborator

Some early notes:

Was able to get the command line tool running:

pip install spliceai tensorflow
cat v03_pipeline/var/test/callsets/1kg_30variants.vcf| spliceai -R vep_data/hg19.fa -A grch37

We have a couple of options:

  1. Try to hack spliceai into hail's VEP call (which has a hail table -> stdout -> command execution -> hail table) setup (the least work but the most brittle).
  2. Do something similar to what we've done with the clingen allele registry and manage the hail export, shell exec, vcf parse, and hail import ourselves.
    • The main concern here is that the performance is quite bad. I'm seeing about 20 variants/s per worker when running locally on a vcf, in comparison with 150 variants/s per worker for VEP.
  3. read and digest the spliceai source and try to run the Keras models in batch over a variant list rather than one-by-one. Just spitballing, I'd guess that this is at least 2 weeks of work for me on its own, with maybe an 80% chance of succeeding.

Regardless, we should read more in depth/have a convo with BenW about the bug fixes and changes he's made to the spliceai source on his fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants