Add `keep_ram_copy_of_weights` config option #7565

RyanJDick · 2025-01-16T22:23:56Z

Summary

This PR adds a keep_ram_copy_of_weights config option. The default (and legacy) behavior is true. The tradeoffs for this setting are as follows:

keep_ram_copy_of_weights: true: Faster model switching and LoRA patching.
keep_ram_copy_of_weights: false: Lower average RAM load (may not help significantly with peak RAM).

Related Issues / Discussions

Helps with High Windows Committed Memory (Virtual Memory) #7563
The Low-VRAM docs are updated to include this feature in Revise the default logic for the model cache RAM limit #7566

QA Instructions

Test with enable_partial_load: false and keep_ram_copy_of_weights: false.
- RAM usage when model is loaded is reduced.
- Model loading / unloading works as expected.
- LoRA patching still works.
Test with enable_partial_load: false and keep_ram_copy_of_weights: true.
- Behavior should be unchanged.
Test with enable_partial_load: true and keep_ram_copy_of_weights: false.
- RAM usage when model is loaded is reduced.
- Model loading / unloading works as expected.
- LoRA patching still works.
Test with enable_partial_load: true and keep_ram_copy_of_weights: true.
- Behavior should be unchanged.
Smoke test CPU-only and MPS with default configs.

Merge Plan

Merge Reduce peak memory during FLUX model load #7564 first and change target branch.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…dModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False).

psychedelicious

Looks good. I like the one-at-a-time weights xfer to reduce peak memory!

## Summary This PR revises the logic for calculating the model cache RAM limit. See the code for thorough documentation of the change. The updated logic is more conservative in the amount of RAM that it will use. This will likely be a better default for more users. Of course, users can still choose to set a more aggressive limit by overriding the logic with `max_cache_ram_gb`. ## Related Issues / Discussions - Should help with #7563 ## QA Instructions Exercise all heuristics: - [x] Heuristic 1 - [x] Heuristic 2 - [x] Heuristic 3 - [x] Heuristic 4 ## Merge Plan - [x] Merge #7565 first and update the target branch ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_

RyanJDick added 4 commits January 16, 2025 14:51

Add keep_ram_copy option to CachedModelWithPartialLoad.

04087c3

Add keep_ram_copy option to CachedModelOnlyFullLoad.

c76d08d

Add keep_ram_copy_of_weights config option.

36a3869

Memory optimization to load state dicts one module at a time in Cache…

da589b3

…dModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False).

github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services python-tests PRs that change python tests labels Jan 16, 2025

RyanJDick mentioned this pull request Jan 16, 2025

Revise the default logic for the model cache RAM limit #7566

Merged

9 tasks

Update config docstring.

e5e848d

Base automatically changed from ryan/lower-virtual-memory to main January 16, 2025 23:47

RyanJDick marked this pull request as ready for review January 16, 2025 23:57

RyanJDick requested review from lstein, blessedcoolant, brandonrising, hipsterusername and psychedelicious as code owners January 16, 2025 23:57

hipsterusername approved these changes Jan 17, 2025

View reviewed changes

psychedelicious approved these changes Jan 17, 2025

View reviewed changes

RyanJDick merged commit f7511bf into main Jan 17, 2025
29 checks passed

RyanJDick deleted the ryan/lower-virtual-memory-2 branch January 17, 2025 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `keep_ram_copy_of_weights` config option #7565

Add `keep_ram_copy_of_weights` config option #7565

RyanJDick commented Jan 16, 2025 •

edited

Loading

psychedelicious left a comment

Add keep_ram_copy_of_weights config option #7565

Add keep_ram_copy_of_weights config option #7565

Conversation

RyanJDick commented Jan 16, 2025 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

psychedelicious left a comment

Choose a reason for hiding this comment

Add `keep_ram_copy_of_weights` config option #7565

Add `keep_ram_copy_of_weights` config option #7565

RyanJDick commented Jan 16, 2025 •

edited

Loading