Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results that go missing when the protein sequence is made longer (or software version changes) #342

Open
krabapple opened this issue Nov 13, 2023 · 1 comment

Comments

@krabapple
Copy link

krabapple commented Nov 13, 2023

I don't know if this is expected behavior (I hope not).

I have a 211aa protein sequence that is truncated -- i.e., does not end in a stop codon:

MQTQAFCGIIQIDGTFLFCIKKGNLLIIGTPAPNNRLIPIAFAWSVSENTITIKDMLTKLKSFIPPSRFKNIYSDQGPAIIAAVRESGFSCDHKFCLRHFATKREYINVYSEIVEVAYADHPQKRIDLIKKLETRLQEEYPNRENNQDLFKYLDSINPFEGFADYTAGILTTSLIESLNAEIKDKWDTYEPAELIIRLIEHEFNLVKNVLT

When I run it through interproscan-5.61-93 (as well as two earlier versions), it returns a PFAM MULE transposase domain in residues 9-101 (PF10551/IPR018289)
And from other contextual data I do expect this to be a possible MULE transposase.

When I extend the sequence to the next in-frame stop codon (i.e., completing it) it makes a 355aa sequence:
MQTQAFCGIIQIDGTFLFCIKKGNLLIIGTPAPNNRLIPIAFAWSVSENTITIKDMLTKLKSFIPPSRFKNIYSDQGPAIIAAVRESGFSCDHKFCLRHFATKREYINVYSEIVEVAYADHPQKRIDLIKKLETRLQEEYPNRENNQDLFKYLDSINPFEGFADYTAGILTTSLIESLNAEIKDKWDTYEPAELIIRLIEHEFNLVKNVLTGDFKSDNIIKNLNETLKHSDMFSSVLYDPIQELYYATFGRYTYCVKIMSDSQYSCTCKHIELYGLPCIHVIAVLNHFSNKNLLKNLNDAVHARFKCSEFMTPVEDLMKFYVDQASLKIPGINFNLGEIEKLRGKRTRIKAFYEK*

However, when this longer version is run through interproscan-5.61-93, it only returns hits to Zinc finger domains/profiles, for residues ~250-290 (i.e., entirely within the added sequences). There are no hits returned to other regions.

Moreover , I tested this with interproscan releases 5.55-88, 5.56-89, 5.61-93, and 5.64-96. The short vs long behavior is the same for 55,56, and 61; 64 doesn't return a MULE hit at all, even using the short sequence.

I explored the behavior further by extending the 211aa sequence in 25aa increments.
55 and 56: MULE @9-101 returned for increments up to 235aa, but not for >= 261aa, only Zn finger @250-290
61: MULE @9-101 returned for increments up to 285aa, but not for >=311 aa, only Zn finger @250-290
64: no MULE hits at any length, only Zn finger @250-290

(and worryingly, results are the same as 64 using the latest version 5.65-97 on the EBI web portal)

So the different results are varying both by sequence length, and by version of software.

PF10551/IPR018289 still exist in the Interpro database (they have not been deprecated) so it's not due to that.

Can you explain why this is happening?

@krabapple krabapple changed the title results that go missing when the protein sequence is made longer (or software version change) results that go missing when the protein sequence is made longer (or software version changes) Nov 13, 2023
@alanlamsiu
Copy link

Hi @krabapple. I am also having the same issue. Did you find an answer for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants