Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] HybridCache not subscriptable #1047

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

hudson-ai
Copy link
Collaborator

Build currently fails due to gemma2's usage of a HybridCache which doesn't support tuple slicing like the friendlier DynamicCache.

"Fixing" this issue (just throwing the cache away...) immediately uncovered another one -- the HybridCache has a maximum size. If we don't set this manually, it is set to the sequence length of the first token sequence the model is called with. Trying to do another forward pass with more tokens leads to exceptions deep down inside of gemma's implementation. Current "fix" is to again... throw the cache away.

Hoping for something more elegant. But I don't think this is too insane for now.

Note: now taking advantage of Cache.crop for cache implementations that support it. This should prevent conversion back and forth from the "legacy" cache format that we previously assumed. (Should fix #986).

@hudson-ai hudson-ai requested a review from paulbkoch October 11, 2024 06:26
Comment on lines 475 to 479
# TODO: this seems to get set to the length of the first sequence we pass for models using
# StaticCache or HybridCache. We need to initialize our own cache with a large enough size
# if we want to continue generation with the same cache.
self._past_key_values = None
past_length = 0
Copy link
Collaborator Author

@hudson-ai hudson-ai Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is necessary for SlidingWindowCache?

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 46.87500% with 17 lines in your changes missing coverage. Please review.

Project coverage is 60.28%. Comparing base (46340aa) to head (2976cbd).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
guidance/models/transformers/_transformers.py 46.87% 17 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1047       +/-   ##
===========================================
+ Coverage   49.40%   60.28%   +10.88%     
===========================================
  Files          71       71               
  Lines        5852     5880       +28     
===========================================
+ Hits         2891     3545      +654     
+ Misses       2961     2335      -626     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hudson-ai
Copy link
Collaborator Author

Passing all the model specific tests in the CI Tests workflow. General tests failing due to azurecloud auth issues -- not sure if I am able to rerun with the right perms. But I believe all is fine and dandy with the PR.

@hudson-ai
Copy link
Collaborator Author

Changes since submitting PR:

  • When we overflow the size of the cache, reallocate a cache with double the size rather than deleting the old cache, which would cause the next forward pass to only create a cache that is "just big enough".
    • To be seen whether we need to be a bit more conservative here to avoid OOM issues when the sequence length is long.
    • We only do this for Static and Hybrid caches. Other cache types will be deleted until we write implementations for doubling their sizes. We give a warning in this case.
  • When we need to reset the cache due to backtracking, we now call Cache.reset if possible in order to avoid reallocating the cache. This also prevents the cache-doubling from resetting.

@hudson-ai
Copy link
Collaborator Author

@paulbkoch any feedback?

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me if it makes the tests work better. But @paulbkoch likely understands the details better

@hudson-ai
Copy link
Collaborator Author

@Harsha-Nori @paulbkoch would appreciate both sets of your eyes on this before merging -- may be necessary to pass CI for build

@Harsha-Nori
Copy link
Collaborator

@paulbkoch would appreciate your 2 cents here too!

@hudson-ai
Copy link
Collaborator Author

@riedgar-ms what's your opinion on the (non-x86) mac-x86 tests? Drop them from the matrix or try and find another runner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformers past_key_values deprecated
4 participants