Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect CIGAR string generation in versions 0.16 through 0.21 #270

Open
GRGong opened this issue Sep 11, 2024 · 8 comments
Open

Incorrect CIGAR string generation in versions 0.16 through 0.21 #270

GRGong opened this issue Sep 11, 2024 · 8 comments

Comments

@GRGong
Copy link

GRGong commented Sep 11, 2024

Dear wfmash developers,

I've identified an issue with CIGAR strings in PAF files generated by wfmash versions 0.16 and later. This problem appears to be related to the inversion patching feature introduced in v0.16.

Key points:

  • Affected versions: 0.16, 0.17, 0.18, 0.21 (not tested: 0.19, 0.20)
  • The issue causes problems when processing PAF files with other tools, such as rustybam.

Example error (using rustybam break-paf -m 5000):
toy.zip

thread 'main' panicked at src/paf.rs:71:43:
called Result::unwrap() on an Err value: PafParseCigar { msg: "query bases 4000 from cigar does not equal 59000-55354=3646\nCM055321.1\t82983525\t55354\t59000\t-\tscaffold_1\t182733053\t172879498\t172883499\t3627\t4126\t9\tid:Z:\tcg:Z:3X1=2I1=3X1=1X2=1X2=1X1=1I1=1X1=1X1=1X1=2X2=1X3=2X1=1X1=1X1=1D2=1X3=1I1X1=1X2=1X1=59I3=2X1=1X2=2I2=1X1=1X1=1X2=1X3=3X1=2X1=2X1=1X2=3I1X1=2X3=1X4=1X3=1X1=1X1=1I1X2=4X1=1X2=1X1=1X7D2=1X1=2X4=1X1=1X3=1X3=1X3=4X1=1X4=1X1=1X1=1X1=5X8=1X2=2D1=2X1=1X1=5X1=1X2=1D1X5=1X1=2X1=1X2=1X1=1X3=1X1=78D38=20I110=3D2=2X18=1X67=1X419=1X20=3I367=1X113=1X63=1X82=1X332=1I17=1D84=1X32=1X161=1X25=1X123=1X225=1X157=3D21=1X24=1X282=2I278=1X214=1X46=1X3=5D3=1X2=1X1=2I1X3=1X1=1X2=4D1=2X1=3X1=1X4=1X1=4D1X1=2X3=1X2=2D1=1X1=1X2=1X1=1X2=1X2=2I1=1X2=2X2=1X5=1X4=3D1=1X1=2X1=1X3=1X2=6D3=3X4=3X3=3X1=1X1=2X1=3X1=1X1=1X1=2X1=3X2=2X1=1X1=2X2=1X1=2X3=1X2=1X1=4X1=1X1=2I3=1I1=1X3=2X2=1X1=2D1X2=3X2=3X1=2X1=3X3=1X1=1X3=2X2=3X4=1X2=1D3=3D1X4=2X1=1X1=2X3=1X3=3X1=2X1=4X1=1X4=2I1=1X2=3X1=1X2=2I1X1=1X4=1I2X1=1X1=1X1=2X3=1X2=1X1=1X2=11I1X4=3X3=1X2=1X1=2X2=1X1=1X1=7I1X3=1X\n" }

Steps to reproduce:

  • Generate a PAF file using wfmash v0.21 with parameters: -t 96 -4 -p 60
  • Process the resulting PAF file with rustybam

This issue does not occur with wfmash v0.15.

Could you please investigate this CIGAR string inconsistency? It would be helpful to understand if this is a bug or if there have been changes in the CIGAR string format that need to be addressed in downstream tools.

Thank you for your attention to this matter.

Best regards,
Gaorui

@ekg
Copy link
Collaborator

ekg commented Sep 16, 2024

The next release will resolve this. Thanks!

@ekg
Copy link
Collaborator

ekg commented Sep 20, 2024

Does the current main HEAD resolve this issue? I've now integrated integration tests of PAF correctness, which should be equivalent to the SAM correctness using https://github.com/ekg/pafcheck.

@GRGong
Copy link
Author

GRGong commented Sep 23, 2024

Thank you for the quick response!
Unfortunately, I am working on a cluster that lacks some necessary libraries, and I am unable to compile wfmash from source.
Would it be possible for you to provide a precompiled binary of wfmash?

@baozg
Copy link

baozg commented Sep 23, 2024

@GRGong You could change the Dockerfile with wfmash HEAD for docker image. If you don't have access to root, singlularity remote builder would be help (https://cloud.sylabs.io/builder).

@ekg
Copy link
Collaborator

ekg commented Sep 23, 2024

@GRGong here's a wfmash binary. I should probably make a release, but I prefer to do that once you've confirmed that this resolves the issues you're seeing. If not, we should resolve and add some automated tests to prevent future problems. Right now I'm testing SAM, PAF, and MAF conversion steps using github actions.

Just gunzip and make sure it's executable: wfmash-v0.21.0-38-gb731e41.gz

@GRGong
Copy link
Author

GRGong commented Sep 24, 2024

@ekg Thanks for the binary. I tested the provided binary using my own genomes, but it still has the CIGAR problem. For your reference, I’ve uploaded the query and target FASTA files, along with the command I used and the error log.

Here is the link:
https://drive.google.com/file/d/18MzFalZhVnKt-hTfTxmxI2KsdZxh6Zsf/view?usp=sharing

Note: The two genomes belong to divergent insect species, but they are still in the same subfamily. The previous version, wfmash v0.15, worked without issues.

@GRGong
Copy link
Author

GRGong commented Dec 17, 2024

Hi,

I want to follow up regarding the issue.
Could you kindly let me know if there have been any updates or insights regarding this issue?

Best,
gr

@ekg
Copy link
Collaborator

ekg commented Dec 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants