Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding Matrix Size Not Resized Properly - Bug Report #1483

Open
sumukshashidhar opened this issue Dec 29, 2024 · 2 comments
Open

Embedding Matrix Size Not Resized Properly - Bug Report #1483

sumukshashidhar opened this issue Dec 29, 2024 · 2 comments

Comments

@sumukshashidhar
Copy link

(Continued) Pre-Training a model - unsloth works perfectly without special tokens, but, with special tokens, I get the following error:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.11: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: NVIDIA H100 80GB HBM3. Max memory: 79.109 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 9.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00,  1.25s/it]
Traceback (most recent call last):
  File "/shared/storage-01/users/sumuks2/foundry/paper-reviews-finetuning/src/_experimental/finetune_14_special_toks.py", line 27, in <module>
    add_new_tokens(model, tokenizer, new_tokens = ["<review>", "</review>", "<paper_title>", "</paper_title>", "<paper_abstract>", "</paper_abstract>", "<paper_keywords>", "</paper_keywords>", "<review_title>", "</review_title>", "<review_text>", "</review_text>", "<review_rating>", "</review_rating>", "<review_confidence>", "</review_confidence>"])
  File "/shared/storage-01/users/sumuks2/foundry/paper-reviews-finetuning/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/shared/storage-01/users/sumuks2/foundry/paper-reviews-finetuning/.venv/lib/python3.10/site-packages/unsloth_zoo/tokenizer_utils.py", line 132, in add_new_tokens
    raise RuntimeError(
RuntimeError: Unsloth: Embedding matrix size did not get resized properly. Please file a bug report!
@johnpaulbin
Copy link
Contributor

same error here for llama3 8B

@danielhanchen
Copy link
Contributor

Much apologies on the delay - I'm working on making adding new tokens much better - hopefully in a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants