Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maf-index error: Different order between records #17

Open
sivico26 opened this issue Oct 14, 2024 · 3 comments
Open

maf-index error: Different order between records #17

sivico26 opened this issue Oct 14, 2024 · 3 comments

Comments

@sivico26
Copy link

Hi,

Thanks for developing wgatools, it looks very promising.

I am particularly interested in visualizing some .maf files that I generated by progressive alignment with cactus, and then converting to .maf with cactus-hal2maf.

When I try to index the resulting files I get the following:

$ wgatools maf-index reg11_nobav.maf 
2024-10-14T15:23:16.139440118+02:00 ERROR There is a different order between Records!

Even though the message sounds simple enough, I am unsure what exactly the problem is or how to address it. Can I ask you to elaborate on this one?

Here is one of my input files: reg11_nobav.maf.txt

Thank you in advance for your help.

@wjwei-handsome
Copy link
Owner

Hi @sivico26, sorry for the late reply.

The reason for this error is:

The species order in a maf file should be fixed, such as:

a ...
b ...
c ...

a ...
b ...
c ...

BUT NOT:

a ...
b ...
c ...

a ...
c ...
b ...

In your provided file, I found a strange block in L5062-L5070, looks like:

   s       ref.Cexcelsa_scaf_4:10496403-10611174   112460  1       +       114771  A
   s       cdan.cdan       98736   1       +       104684  C
   s       cgro6.cgro6     104117  1       +       115507  C
   s       cgro7.cgro7     92928   1       +       100827  C
   s       coff.coff       92012   1       +       101072  A
   s       cpol.cpol       108588  1       +       112818  A
   s       ctat.ctat       93892   1       +       102711  A

The number of species in this block is not equal to your all species. Please check!

If you need further help, please let me know :)

Best,
Wenjie

@sivico26
Copy link
Author

sivico26 commented Oct 22, 2024

Hi again, sorry for the late response too.

The default export of cactus-hal2maf makes it virtually impossible to fulfill the criterion. But I started to play with some of its parameters (--maximumBlockLengthToMerge and --maximumGapLength) and managed to export the alignment in only 3 blocks. And that new alignment was indexable.

Now, I started to look at it, but the viewer crashed with this error:

2024-10-22T13:15:43.992569517+02:00 ERROR scroll out of u16 range, This error is due to the scrolling limit of `ratatui`(https://github.com/ratatui-org/ratatui/issues/399). You can temporarily use the `chunk` subcommand to chunk it with a appropriate size (< 65535).

I guess alignments this big were not expected?

Also, probably related, when I tried to navigate the alignment in the toggle view, it also crashed when trying to jump to the second block in the image:

image

After the crash, the terminal graphics remain compromised (maybe a process not properly terminated?). The prompt does not align to the left and you can no longer see what you are typing (although the terminal does respond to your commands) :

image

Anyway, this is now become a separate issue. Would you prefer me to open another one? But before leaving the original, is that requirement strictly necessary? The order of the species I get it, but at least for cactus alignments, it is very common to find sequences with indels that induce block breaking, so very often some blocks would miss a species, but then it would reappear in the next block. I am asking because I think this requirement restricts a lot of the diversity of alignments that can be indexed.

@wjwei-handsome
Copy link
Owner

Hi @sivico26 !

  1. The visualization problem is caused by the size of a line in the maf file being too large, which is beyond the scrolling range. As for this restriction, you can use the wgatools chunk subcommand to chunk it, which will make everything smoother, including navigation.
  2. As for the MAF integrity requirement, I think this is a great suggestion and I will implement it (But it might take some time). Indeed, as you said, Incomplete blocks may be common in cross-species align processes. If you like, you can open a new issue according to your idea and tag the enhancement label.

Thanks!
Wenjie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants