Skip to content

Commit

Permalink
Update Transformers to 4.35 and fix pad_to_multiple_of (#4817)
Browse files Browse the repository at this point in the history
## What type of PR is this? (check all applicable)

- [ ] Refactor
- [ ] Feature
- [X] Bug Fix
- [X] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission


## Have you discussed this change with the InvokeAI team?
- [X] Yes, with @blessedcoolant 
- [ ] No, because:

      
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No


## Description

This PR updates Transformers to the most recent version and fixes the
value `pad_to_multiple_of` for `text_encoder.resize_token_embeddings`
which was introduced with
huggingface/transformers#25088 in Transformers
4.32.0.

According to the [Nvidia
Documentation](https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc),
`Performance is better when equivalent matrix dimensions M, N, and K are
aligned to multiples of 8 bytes (or 64 bytes on A100) for FP16`
This fixes the following error that was popping up before every
invocation starting with Transformers 4.32.0
`You are resizing the embedding layer without providing a
pad_to_multiple_of parameter. This means that the new embedding
dimension will be None. This might induce some performance reduction as
Tensor Cores will not be available. For more details about this, or help
on choosing the correct value for resizing, refer to this guide:
https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc`

This is my first "real" fix PR, so I hope this is fine. Please inform me
if there is anything wrong with this. I am glad to help.

Have a nice day and thank you!


## Related Tickets & Documents

<!--
For pull requests that relate or close an issue, please include them
below. 

For example having the text: "closes #1234" would connect the current
pull
request to issue 1234.  And when we merge the pull request, Github will
automatically close the issue.
-->

- Related Issue:
huggingface/transformers#26303
- Related Discord discussion:
https://discord.com/channels/1020123559063990373/1154152783579197571
- Closes #

## QA Instructions, Screenshots, Recordings

<!-- 
Please provide steps on how to test changes, any hardware or 
software specifications as well as any other pertinent information. 
-->

## Added/updated tests?

- [ ] Yes
- [ ] No : _please replace this line with details on why tests
      have not been included_

## [optional] Are there any post deployment tasks we need to perform?
  • Loading branch information
Millu authored Nov 10, 2023
2 parents 8702a63 + 6001d3d commit d63a614
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 4 deletions.
15 changes: 12 additions & 3 deletions invokeai/backend/model_management/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,15 @@ def apply_ti(
init_tokens_count = None
new_tokens_added = None

# TODO: This is required since Transformers 4.32 see
# https://github.com/huggingface/transformers/pull/25088
# More information by NVIDIA:
# https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
# This value might need to be changed in the future and take the GPUs model into account as there seem
# to be ideal values for different GPUS. This value is temporary!
# For references to the current discussion please see https://github.com/invoke-ai/InvokeAI/pull/4817
pad_to_multiple_of = 8

try:
# HACK: The CLIPTokenizer API does not include a way to remove tokens after calling add_tokens(...). As a
# workaround, we create a full copy of `tokenizer` so that its original behavior can be restored after
Expand All @@ -175,7 +184,7 @@ def apply_ti(
# but a pickle roundtrip was found to be much faster (1 sec vs. 0.05 secs).
ti_tokenizer = pickle.loads(pickle.dumps(tokenizer))
ti_manager = TextualInversionManager(ti_tokenizer)
init_tokens_count = text_encoder.resize_token_embeddings(None).num_embeddings
init_tokens_count = text_encoder.resize_token_embeddings(None, pad_to_multiple_of).num_embeddings

def _get_trigger(ti_name, index):
trigger = ti_name
Expand All @@ -190,7 +199,7 @@ def _get_trigger(ti_name, index):
new_tokens_added += ti_tokenizer.add_tokens(_get_trigger(ti_name, i))

# modify text_encoder
text_encoder.resize_token_embeddings(init_tokens_count + new_tokens_added)
text_encoder.resize_token_embeddings(init_tokens_count + new_tokens_added, pad_to_multiple_of)
model_embeddings = text_encoder.get_input_embeddings()

for ti_name, ti in ti_list:
Expand Down Expand Up @@ -222,7 +231,7 @@ def _get_trigger(ti_name, index):

finally:
if init_tokens_count and new_tokens_added:
text_encoder.resize_token_embeddings(init_tokens_count)
text_encoder.resize_token_embeddings(init_tokens_count, pad_to_multiple_of)

@classmethod
@contextmanager
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ dependencies = [
"torchvision~=0.16",
"torchmetrics~=0.11.0",
"torchsde~=0.2.5",
"transformers~=4.31.0",
"transformers~=4.35.0",
"uvicorn[standard]~=0.21.1",
"windows-curses; sys_platform=='win32'",
]
Expand Down

0 comments on commit d63a614

Please sign in to comment.