-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Partial Loading PR4: Enable partial loading (behind config flag) (#7505)
## Summary This PR adds support for partial loading of models onto the GPU. This enables models to run with much lower peak VRAM requirements (e.g. full FLUX dev with 8GB of VRAM). The partial loading feature is enabled behind a new config flag: `enable_partial_loading=True`. This flag defaults to `False`. **Note about performance:** The `ram` and `vram` config limits are still applied when `enable_partial_loading=True` is set. This can result in significant slowdowns compared to the 'old' behaviour. Consider the case where the VRAM limit is set to `vram=0.75` (GB) and we are trying to run an 8GB model. When `enable_partial_loading=False`, we attempt to load the entire model into VRAM, and if it fits (no OOM error) then it will run at full speed. When `enable_partial_loading=True`, since we have the option to partially load the model we will only load 0.75 GB into VRAM and leave the remaining 7.25 GB in RAM. This will cause inference to be much slower than before. To workaround this, it is important that your `ram` and `vram` configs are carefully tuned. In a future PR, we will add the ability to dynamically set the RAM/VRAM limits based on the available memory / VRAM. ## Related Issues / Discussions - #7492 - #7494 - #7500 ## QA Instructions Tests with `enable_partial_loading=True`, `vram=2`, on CUDA device: For all tests, we expect model memory to stay below 2 GB. Peak working memory will be higher. - [x] SD1 inference - [x] SDXL inference - [x] FLUX non-quantized inference - [x] FLUX GGML-quantized inference - [x] FLUX BnB quantized inference - [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests Tests with `enable_partial_loading=True`, and hack to force all models to load 10%, on CUDA device: - [x] SD1 inference - [x] SDXL inference - [x] FLUX non-quantized inference - [x] FLUX GGML-quantized inference - [x] FLUX BnB quantized inference - [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests Tests with `enable_partial_loading=False`, `vram=30`: We expect no change in behaviour when `enable_partial_loading=False`. - [x] SD1 inference - [x] SDXL inference - [x] FLUX non-quantized inference - [x] FLUX GGML-quantized inference - [x] FLUX BnB quantized inference - [x] Variety of ControlNet / IP-Adapter / LoRA smoke tests Other platforms: - [x] No change in behavior on MPS, even if `enable_partial_loading=True`. - [x] No change in behavior on CPU-only systems, even if `enable_partial_loading=True`. ## Merge Plan - [x] Merge #7500 first, and change the target branch to main ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_
- Loading branch information
Showing
23 changed files
with
396 additions
and
292 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.