Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B] #668

CNwangbin · 2024-03-08T04:10:23Z

I am currently exploring the potential of leveraging the ESM2 series of models for a project involving protein sequence analysis. Given the diversity in the scale of models available, I have a specific question that I hope you could help clarify.
Could you please confirm if all these variants of the ESM2 models use an identical alphabet for encoding protein sequences into tokens? Essentially, I am interested in understanding whether the token sequences generated from the same protein sequence would be identical across these different model scales.
The reason behind this inquiry is to ensure that our preprocessing pipeline remains consistent and compatible when utilizing multiple versions of the ESM2 models for comparative analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B] #668

Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B] #668

CNwangbin commented Mar 8, 2024

Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B] #668

Query on Alphabet Consistency Across Different Scales of ESM2 Models [8M, 35M, 150M, 650M, 3B, 150B] #668

Comments

CNwangbin commented Mar 8, 2024