Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keep_ram_copy_of_weights config option #7565

Merged
merged 5 commits into from
Jan 17, 2025
Merged

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Jan 16, 2025

Summary

This PR adds a keep_ram_copy_of_weights config option. The default (and legacy) behavior is true. The tradeoffs for this setting are as follows:

  • keep_ram_copy_of_weights: true: Faster model switching and LoRA patching.
  • keep_ram_copy_of_weights: false: Lower average RAM load (may not help significantly with peak RAM).

Related Issues / Discussions

QA Instructions

  • Test with enable_partial_load: false and keep_ram_copy_of_weights: false.

    • RAM usage when model is loaded is reduced.
    • Model loading / unloading works as expected.
    • LoRA patching still works.
  • Test with enable_partial_load: false and keep_ram_copy_of_weights: true.

    • Behavior should be unchanged.
  • Test with enable_partial_load: true and keep_ram_copy_of_weights: false.

    • RAM usage when model is loaded is reduced.
    • Model loading / unloading works as expected.
    • LoRA patching still works.
  • Test with enable_partial_load: true and keep_ram_copy_of_weights: true.

    • Behavior should be unchanged.
  • Smoke test CPU-only and MPS with default configs.

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services python-tests PRs that change python tests labels Jan 16, 2025
Base automatically changed from ryan/lower-virtual-memory to main January 16, 2025 23:47
@RyanJDick RyanJDick marked this pull request as ready for review January 16, 2025 23:57
Copy link
Collaborator

@psychedelicious psychedelicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like the one-at-a-time weights xfer to reduce peak memory!

@RyanJDick RyanJDick merged commit f7511bf into main Jan 17, 2025
29 checks passed
@RyanJDick RyanJDick deleted the ryan/lower-virtual-memory-2 branch January 17, 2025 00:57
RyanJDick added a commit that referenced this pull request Jan 17, 2025
## Summary

This PR revises the logic for calculating the model cache RAM limit. See
the code for thorough documentation of the change.

The updated logic is more conservative in the amount of RAM that it will
use. This will likely be a better default for more users. Of course,
users can still choose to set a more aggressive limit by overriding the
logic with `max_cache_ram_gb`.

## Related Issues / Discussions

- Should help with #7563

## QA Instructions

Exercise all heuristics:
- [x] Heuristic 1
- [x] Heuristic 2
- [x] Heuristic 3
- [x] Heuristic 4

## Merge Plan

- [x] Merge #7565 first and
update the target branch

## Checklist

- [x] _The PR has a short but descriptive title, suitable for a
changelog_
- [x] _Tests added / updated (if applicable)_
- [x] _Documentation added / updated (if applicable)_
- [ ] _Updated `What's New` copy (if doing a release after this PR)_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files python-tests PRs that change python tests services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants