Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paf2maf fail with FastGA and wfmash #15

Open
baozg opened this issue Sep 10, 2024 · 4 comments
Open

paf2maf fail with FastGA and wfmash #15

baozg opened this issue Sep 10, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@baozg
Copy link

baozg commented Sep 10, 2024

Hi Wenjie,

I have meet some problem when using wgatools paf2maf. It could worked with minimap2 and anchorwave PAF output, but it failed with FastGA and wfmash alignment now. I checked the alignment, it looks fine to me. Could you help me find what's the problem with these alignments?

https://keeper.mpdl.mpg.de/d/4b78e4b87c0449d3b821/
Col-CC and Ler-0 would be target and query, and wgatools folder for FastGA and wfmash alignments

@wjwei-handsome
Copy link
Owner

Hi @baozg ,

I checked the wfmash data, the problem is:

query_end  != query_start + Match/Mismatch + INS_size

And the target position looks correct.

Based on this theorem:

query_start + Match/Mismatch + INS_size = query_end 
ref_start + Match/Mismatch + DEL_size = ref_end

I checked the output of the old-version(v0.12.1-5-gd6532bc) wfmash, it looks great.

This also inspired me, I will develop a validation paf command, perhaps can also repair the WRONG paf.

As for FastGA, It seems to reverse the order of query and target in paf file😵‍💫. Because if I try to swap the fasta files of target and query, everything works fine. It may be that the target/query order of FastGA's output is reversed, or your input does not meet FastGA's expectations :)

I hope this is helpful to you. Please keep in touch if you have any questions later!

Best regards,
Wenjie

@wjwei-handsome
Copy link
Owner

> sed -n '64p' Col-CC_Ler-0_MPIBT.wfmash_21.paf
Chr1	32485061	3030000	3079896	+	Chr1	32637894	3030124	3080307	49819	50214	21	gi:f:0.996241	bi:f:0.992134	md:f:0.996967	cg:Z:[.....]
> math 3030000+49819+150+31 // q_start=3030000 match_size=49819 mismatch_size=150 ins_size=31
3080000 // The correct query end is 3080000
> sed -n '64p' Col-CC_Ler-0_MPIBT.wfmash_21.paf|sed 's/3079896/3080000/g'|wgatools p2m -g Col-CC.chr.fa.gz -q Ler-0.fa.gz -o test.maf -r // everything is OK

@baozg baozg closed this as completed Sep 11, 2024
@wjwei-handsome wjwei-handsome added the enhancement New feature or request label Sep 12, 2024
@wjwei-handsome
Copy link
Owner

Reopen it for reminder myself to develop validate sub-cmd 🤖

@wjwei-handsome
Copy link
Owner

Whoo hoo

validate sub-cmd done in 53c57fa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants