Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sdxl lightning quant use #992

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions onediff_diffusers_extensions/examples/lightning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Run SDXL-Lightning with OneDiff

1. [Environment Setup](#environment-setup)
- [Set Up OneDiff](#set-up-onediff)
- [Set Up Compiler Backend](#set-up-compiler-backend)
- [Set Up SDXL-Lightning](#set-up-sdxl-lightning)
2. [Compile](#compile)
- [Without Compile (Original PyTorch HF Diffusers Baseline)](#without-compile)
- [With OneFlow Backend](#with-oneflow-backend)
- [With NexFort Backend](#with-nexfort-backend)
3. [Quantization (Int8)](#quantization)
- [With Quantization - OneFlow Backend](#with-quantization---oneflow-backend)
- [With Quantization - NexFort Backend](#with-quantization---nexfort-backend)
4. [Performance Comparison](#performance-comparison)
5. [Quality](#quality)

## Environment Setup

### Set Up OneDiff
Follow the instructions to set up OneDiff from the https://github.com/siliconflow/onediff?tab=readme-ov-file#installation.

### Set Up Compiler Backend
OneDiff supports two compiler backends: OneFlow and NexFort. Follow the setup instructions for these backends from the https://github.com/siliconflow/onediff?tab=readme-ov-file#install-a-compiler-backend.


### Set Up SDXL-Lightning
- HF model: [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)
- HF pipeline: [diffusers usage](https://huggingface.co/ByteDance/SDXL-Lightning#2-step-4-step-8-step-unet)

## Compile

> [!NOTE]
Current test is based on an 8 steps distillation model.

### Run 1024x1024 Without Compile (Original PyTorch HF Diffusers Baseline)
```bash
python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \
--prompt "product photography, world of warcraft orc warrior, white background" \
--compiler none \
--saved_image sdxl_light.png
```

### Run 1024x1024 With Compile [OneFlow Backend]
```bash
python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \
--prompt "product photography, world of warcraft orc warrior, white background" \
--compiler oneflow \
--saved_image sdxl_light_oneflow_compile.png
```

### Run 1024x1024 With Compile [NexFort Backend]
```bash
python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \
--prompt "product photography, world of warcraft orc warrior, white background" \
--compiler nexfort \
--compiler-config '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last", "options": {"triton.fuse_attention_allow_fp16_reduction": false}}' \
--saved_image sdxl_light_nexfort_compile.png
```


## Quantization (Int8)

> [!NOTE]
Quantization is a feature for onediff enterprise.

### Run 1024x1024 With Quantization [OneFlow Backend]

Execute the following command to quantize the model, where `--quantized_model` is the path to the quantized model. For an introduction to the quantization parameters, refer to: https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#diffusers-with-onediff-enterprise

```bash
python3 onediff_diffusers_extensions/tools/quantization/quantize-sd-fast.py \
--quantized_model /path/to/sdxl_lightning_oneflow_quant \
--conv_ssim_threshold 0.1 \
--linear_ssim_threshold 0.1 \
--conv_compute_density_threshold 300 \
--linear_compute_density_threshold 300 \
--save_as_float true \
--use_lightning 1
lixiang007666 marked this conversation as resolved.
Show resolved Hide resolved
```

Test the quantized model:

```bash
python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \
--prompt "product photography, world of warcraft orc warrior, white background" \
--compiler oneflow \
--use_quantization \
--base /path/to/sdxl_lightning_oneflow_quant \
--saved_image sdxl_light_oneflow_quant.png
```


### Run 1024x1024 With Quantization [NexFort Backend]

```bash
python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \
--prompt "product photography, world of warcraft orc warrior, white background" \
--compiler nexfort \
--compiler-config '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last", "options": {"triton.fuse_attention_allow_fp16_reduction": false}}' \
--use_quantization \
--quantize-config '{"quant_type": "int8_dynamic"}' \
--saved_image sdxl_light_nexfort_quant.png
```


## Performance Comparison

**Testing on an NVIDIA RTX 4090 GPU, using a resolution of 1024x1024 and 8 steps:**

Data update date: 2024-07-29
| Configuration | Iteration Speed (it/s) | E2E Time (seconds) | Warmup time (seconds) <sup>1</sup> | Warmup with Cache time (seconds) |
|---------------------------|------------------------|--------------------|-----------------------|----------------------------------|
| PyTorch | 14.68 | 0.840 | 1.31 | - |
| OneFlow Compile | 29.06 (+97.83%) | 0.530 (-36.90%) | 52.26 | 0.64 |
| OneFlow Quantization | 43.45 (+195.95%) | 0.424 (-49.52%) | 59.87 | 0.51 |
| NexFort Compile | 28.07 (+91.18%) | 0.526 (-37.38%) | 539.67 | 68.79 |
| NexFort Quantization | 30.85 (+110.15%) | 0.476 (-43.33%) | 610.25 | 93.28 |

<sup>1</sup> OneDiff Warmup with Compilation time is tested on AMD EPYC 7543 32-Core Processor.

## Quality
https://github.com/siliconflow/odeval/tree/main/models/lightning
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
import argparse
import json
import os
import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import: time.

The time module is imported but not used in the code.

-import time
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import time
Tools
Ruff

4-4: time imported but unused

Remove unused import: time

(F401)


import torch
from diffusers import StableDiffusionXLPipeline
from huggingface_hub import hf_hub_download
from onediffx import compile_pipe, load_pipe, quantize_pipe, save_pipe
from onediffx.utils.performance_monitor import track_inference_time
from safetensors.torch import load_file

try:
USE_PEFT_BACKEND = diffusers.utils.USE_PEFT_BACKEND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix undefined name: diffusers.

The variable diffusers is used without being defined or imported, causing a potential error.

-    USE_PEFT_BACKEND = diffusers.utils.USE_PEFT_BACKEND
+    from diffusers import utils
+    USE_PEFT_BACKEND = utils.USE_PEFT_BACKEND
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
USE_PEFT_BACKEND = diffusers.utils.USE_PEFT_BACKEND
from diffusers import utils
USE_PEFT_BACKEND = utils.USE_PEFT_BACKEND
Tools
Ruff

14-14: Undefined name diffusers

(F821)

except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused variable e.

The exception variable e is assigned but never used. Consider removing it.

-except Exception as e:
+except Exception:
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
except Exception:
Tools
Ruff

15-15: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

USE_PEFT_BACKEND = False

parser = argparse.ArgumentParser()
parser.add_argument(
"--base", type=str, default="stabilityai/stable-diffusion-xl-base-1.0"
)
parser.add_argument("--repo", type=str, default="ByteDance/SDXL-Lightning")
parser.add_argument("--cpkt", type=str, default="sdxl_lightning_8step_unet.safetensors")
parser.add_argument("--variant", type=str, default="fp16")
parser.add_argument(
"--prompt",
type=str,
# default="street style, detailed, raw photo, woman, face, shot on CineStill 800T",
default="A girl smiling",
)
parser.add_argument("--save_graph", action="store_true")
parser.add_argument("--load_graph", action="store_true")
parser.add_argument("--save_graph_dir", type=str, default="cached_pipe")
parser.add_argument("--load_graph_dir", type=str, default="cached_pipe")
parser.add_argument("--height", type=int, default=1024)
parser.add_argument("--width", type=int, default=1024)
parser.add_argument(
"--saved_image", type=str, required=False, default="sdxl-light-out.png"
)
parser.add_argument("--seed", type=int, default=1)
parser.add_argument(
"--compiler",
type=str,
default="oneflow",
help="Compiler backend to use. Options: 'none', 'nexfort', 'oneflow'",
)
parser.add_argument(
"--compiler-config", type=str, help="JSON string for nexfort compiler config."
)
parser.add_argument(
"--quantize-config", type=str, help="JSON string for nexfort quantization config."
)
parser.add_argument("--bits", type=int, default=8)
parser.add_argument("--use_quantization", action="store_true")


args = parser.parse_args()

OUTPUT_TYPE = "pil"

n_steps = int(args.cpkt[len("sdxl_lightning_") : len("sdxl_lightning_") + 1])

is_lora_cpkt = "lora" in args.cpkt

if args.compiler == "oneflow":
from onediff.schedulers import EulerDiscreteScheduler
else:
from diffusers import EulerDiscreteScheduler

if is_lora_cpkt:
if not USE_PEFT_BACKEND:
print("PEFT backend is required for load_lora_weights")
exit(0)
pipe = StableDiffusionXLPipeline.from_pretrained(
args.base, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
if os.path.isfile(os.path.join(args.repo, args.cpkt)):
pipe.load_lora_weights(os.path.join(args.repo, args.cpkt))
else:
pipe.load_lora_weights(hf_hub_download(args.repo, args.cpkt))
pipe.fuse_lora()
else:
if args.use_quantization and args.compiler == "oneflow":
print("oneflow backend quant...")
pipe = StableDiffusionXLPipeline.from_pretrained(
args.base, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
import onediff_quant
from onediff_quant.utils import replace_sub_module_with_quantizable_module

quantized_layers_count = 0
onediff_quant.enable_load_quantized_model()

calibrate_info = {}
with open(os.path.join(args.base, "calibrate_info.txt"), "r") as f:
for line in f.readlines():
line = line.strip()
items = line.split(" ")
calibrate_info[items[0]] = [
float(items[1]),
int(items[2]),
[float(x) for x in items[3].split(",")],
]

for sub_module_name, sub_calibrate_info in calibrate_info.items():
replace_sub_module_with_quantizable_module(
pipe.unet,
sub_module_name,
sub_calibrate_info,
False,
False,
args.bits,
)
quantized_layers_count += 1

print(f"Total quantized layers: {quantized_layers_count}")

else:
from diffusers import UNet2DConditionModel

unet = UNet2DConditionModel.from_config(args.base, subfolder="unet").to(
"cuda", torch.float16
)
if os.path.isfile(os.path.join(args.repo, args.cpkt)):
unet.load_state_dict(
load_file(os.path.join(args.repo, args.cpkt), device="cuda")
)
else:
unet.load_state_dict(
load_file(hf_hub_download(args.repo, args.cpkt), device="cuda")
)
pipe = StableDiffusionXLPipeline.from_pretrained(
args.base, unet=unet, torch_dtype=torch.float16, variant="fp16"
).to("cuda")

pipe.scheduler = EulerDiscreteScheduler.from_config(
pipe.scheduler.config, timestep_spacing="trailing"
)

if pipe.vae.dtype == torch.float16 and pipe.vae.config.force_upcast:
pipe.upcast_vae()

# Compile the pipeline
if args.compiler == "oneflow":
print("oneflow backend compile...")
pipe = compile_pipe(
pipe,
)
if args.load_graph:
print("Loading graphs...")
load_pipe(pipe, args.load_graph_dir)
elif args.compiler == "nexfort":
print("nexfort backend compile...")
nexfort_compiler_config = (
json.loads(args.compiler_config) if args.compiler_config else None
)

options = nexfort_compiler_config
pipe = compile_pipe(
pipe, backend="nexfort", options=options, fuse_qkv_projections=True
)
if args.use_quantization and args.compiler == "nexfort":
print("nexfort backend quant...")
nexfort_quantize_config = (
json.loads(args.quantize_config) if args.quantize_config else None
)
pipe = quantize_pipe(pipe, ignores=[], **nexfort_quantize_config)


with track_inference_time(warmup=True):
image = pipe(
prompt=args.prompt,
height=args.height,
width=args.width,
num_inference_steps=n_steps,
guidance_scale=0,
output_type=OUTPUT_TYPE,
).images


# Normal run
torch.manual_seed(args.seed)
with track_inference_time(warmup=False):
image = pipe(
prompt=args.prompt,
height=args.height,
width=args.width,
num_inference_steps=n_steps,
guidance_scale=0,
output_type=OUTPUT_TYPE,
).images


image[0].save(args.saved_image)

if args.save_graph:
print("Saving graphs...")
save_pipe(pipe, args.save_graph_dir)
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#!/bin/bash

python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --save_graph --save_graph_dir cached_unet_pipe
python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --save_graph --save_graph_dir cached_unet_pipe

python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --load_graph --load_graph_dir cached_unet_pipe
python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --load_graph --load_graph_dir cached_unet_pipe


HF_HUB_OFFLINE=0 python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --save_graph --save_graph_dir cached_lora_pipe
HF_HUB_OFFLINE=0 python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --save_graph --save_graph_dir cached_lora_pipe

HF_HUB_OFFLINE=0 python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --load_graph --load_graph_dir cached_lora_pipe
HF_HUB_OFFLINE=0 python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --load_graph --load_graph_dir cached_lora_pipe
Loading
Loading