You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm workin in the branch unified-attention.
I managed to split the weights of the QKV projections but there is still something missing.
When prompting the model doesn't work correctly.
Upgrade the easier to understand GPT-2 attention code to allow loading GPT-2 weights.
i.e. avoid separate loaders/code for pre-trained and non pre-trained model weights https://github.com/LxMLS/lxmls-toolkit/blob/master/lxmls/transformers/model.py#L123
The text was updated successfully, but these errors were encountered: