State of video generation in Diffusers #2592

sayakpaul · 2025-01-13T12:35:23Z

The draft still has some TODOs but it won't prevent a first pass through the content. @DN6 @a-r-r-o-w could you please fill the TODO when you have a moment?

This is why the PR is in WIP mode.

TODOs

Post preview
Thumbnail

sayakpaul · 2025-01-13T12:47:02Z

_blog.yml

+  thumbnail: /blog/assets/video_gen/thumbnail.png
+  date: Jan 23, 2025


Need to be updated.

video_gen.md

Vaibhavs10

Very happy to see this in-action cc: @LysandreJik for vis (when you comeback)

Vaibhavs10

Did a light weight review, sorry if it;s a bit pre-mature for the state of the blog - very excited to see this finally taking shape!

Vaibhavs10 · 2025-01-14T14:16:41Z

video_gen.md

+- user: dn6
+---
+
+# State of open video generation models in Diffusers


It might be worth opening with a video from one of the open video models, maybe even draw comparisons from where the video generation models were a year or two back vs now!

A good example could be the Will Smith benchmark!

Vaibhavs10 · 2025-01-14T14:25:47Z

video_gen.md

+    - Fine-tuning
+- Looking ahead
+
+## Today’s Video Generation Models and their Limitations


Feel free to disagree, but IMO, we should only keep the table here and the limitations can potentially go toward the end of the blogpost, makes it more easy to read.

I do disagree. I think it's common to start with limitations to make the readers have a fuller context.

It depends on the vibe you are going for, up to you since you're the author, to me it just feels odd to start with limitations since even a survey paper conveys the limitations towards the end.

Vaibhavs10 · 2025-01-14T14:26:31Z

video_gen.md

+- Several open models suffer from limited generalization capabilities and underperform expectations of users. Models may require prompting in a certain way, or LLM-like prompts, or fail to generalize to out-of-distribution data, which are hurdles for widespread user adoption. For example, models like LTX-Video often need to be prompted in a very detailed and specific way for obtaining good quality generations.
+- The high computational and memory demands of video generation result in significant generation latency. For local usage, this is often a roadblock. Most new open video models are inaccessible to community hardware without extensive memory optimizations and quantization approaches that affect both inference latency and quality of the generated videos.
+
+## Why is Video Generation Hard?


Same for this, it would be nice to keep a positive outlook going in and then ground it towards the end.

Same as above.

video_gen.md

Vaibhavs10 · 2025-01-14T14:31:42Z

video_gen.md

+| [`tencent/HunyuanVideo`](https://huggingface.co/tencent/HunyuanVideo) | [Link](https://huggingface.co/tencent/HunyuanVideo/blob/main/LICENSE) |
+| [`Lightricks/LTX-Video`](https://huggingface.co/Lightricks/LTX-Video) | [Link](https://huggingface.co/Lightricks/LTX-Video/blob/main/License.txt) |
+
+### **Memory requirements**


It might be beneficial to add inference examples for all/ some models that you mention here, to ground that diffusers is the place to go for inference.

Maybe even with video snippets embedded from those as well - so that people can visually experience them as well.

It might be beneficial to add inference examples for all/ some models that you mention here, to ground that diffusers is the place to go for inference.

It will make it unnecessarily verbose. We will do some snippets but will keep it for only one model as we're already citing the docs for the other models. This is a TODO and will be addressed by @DN6.

It will make it unnecessarily verbose. We will do some snippets but will keep it for only one model as we're already citing the docs for the other models.

Not really, you can just wrap them up into <details> so that it is collapsed by default.

It will make it redundant a bit IMO, as the code doesn't change much. So, showing for a single model is sufficient, I think.

Vaibhavs10 · 2025-01-14T14:34:59Z

video_gen.md

+
+We used the same settings as above to obtain these numbers. Quantization was performed with the [`bitsandbytes` library](https://huggingface.co/docs/bitsandbytes/main/en/index) (Diffusers [supports three different quantization backends](https://huggingface.co/docs/diffusers/main/en/quantization/overview) as of now). Also note that due to numerical precision loss, quantization can impact the quality of the outputs, effects of which are more prominent in videos than images.
+
+## Video Generation with Diffusers


More in-line with the suggestion above, I'd recommend moving this above optimisations/ memory etc.

video_gen.md

pcuenca · 2025-01-14T14:39:56Z

Super cool! Happy to help and/or review as needed.

sayakpaul · 2025-01-14T14:59:06Z

@pcuenca thanks! If you could help generate some of the videos @Vaibhavs10 mentioned that will be very helpful. This is the script I used for the optims:

Code

from diffusers import HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
from diffusers import BitsAndBytesConfig as BitsAndBytesConfig
import argparse
import json
import torch 

prompt = "A cat walks on the grass, realistic. The scene resembles a real-life footage and should look as if it was shot in a sunny day."

def load_pipeline(args):
    if args.bit4_bnb:
        quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
        transformer = HunyuanVideoTransformer3DModel.from_pretrained(
            "hunyuanvideo-community/HunyuanVideo",
            subfolder="transformer",
            quantization_config=quant_config,
            torch_dtype=torch.bfloat16,
        )
    elif args.bit8_bnb:
        quant_config = BitsAndBytesConfig(load_in_8bit=True)
        transformer = HunyuanVideoTransformer3DModel.from_pretrained(
            "hunyuanvideo-community/HunyuanVideo",
            subfolder="transformer",
            quantization_config=quant_config,
            torch_dtype=torch.bfloat16,
        )
    else:
        transformer = HunyuanVideoTransformer3DModel.from_pretrained(
            "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16
        )
    
    pipe = HunyuanVideoPipeline.from_pretrained(
        "hunyuanvideo-community/HunyuanVideo", transformer=transformer, torch_dtype=torch.float16
    )
    
    if not args.enable_model_cpu_offload:
        pipe = pipe.to("cuda")
    else:
        pipe.enable_model_cpu_offload()
    
    if args.vae_tiling:
        pipe.vae.enable_tiling()
    return pipe


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--enable_model_cpu_offload", type=int, choices=[0, 1])
    parser.add_argument("--vae_tiling", type=int, choices=[0, 1])
    parser.add_argument("--bit4_bnb", type=int, choices=[0, 1])
    parser.add_argument("--bit8_bnb", type=int, choices=[0, 1])
    args = parser.parse_args()

    # Construct output path based on argument values
    output_path = f"4bit@{args.bit4_bnb}_8bit@{args.bit8_bnb}_tiling@{args.vae_tiling}_offload@{args.enable_model_cpu_offload}.json"

    pipe = load_pipeline(args)

    _ = pipe(
        prompt, 
        height=512, 
        width=768, 
        num_frames=121, 
        generator=torch.manual_seed(0),
        num_inference_steps=50
    )

    memory = torch.cuda.max_memory_allocated() / (1024 ** 3)

    # Serialize memory usage info to JSON
    memory_data = {
        "prompt": prompt,
        "height": 512,
        "width": 768,
        "num_frames": 121,
        "num_inference_steps": 50,
        "gpu_memory_usage_gb": memory,
        "enable_model_cpu_offload": args.enable_model_cpu_offload,
        "vae_tiling": args.vae_tiling,
        "bit4_bnb": args.bit4_bnb,
        "bit8_bnb": args.bit8_bnb
    }

    with open(output_path, "w") as json_file:
        json.dump(memory_data, json_file, indent=4)

    print(f"Serialized to {output_path=}")

Completely fine if you don't have time. My plate if a bit full, too. So, it will take time.

a-r-r-o-w

Thanks for leading the initiative @sayakpaul. Is there anything specific you'd like me to address? At the moment, I see some TODOs which were regarding the feature PRs not merged yet, but we are very close to merging (just needs final look from @DN6), so we can directly mention them.

Left some other comments as well about how we could nicely showcase memory reduction with/without quantization or other optimizations with upcoming features.

a-r-r-o-w · 2025-01-19T19:26:48Z

video_gen.md

+Note that in the above four options, as of now, we only support the first two. Support for the rest of the two will be merged in soon. If you’re interested to follow along the progress, here are the PRs:
+
+- TODO:
+- TODO:


We are very close to merging PAB, which should cover attention & MLP state re-use. For chunked inference, slicing/tiling/FreeNoise-split-inference are great examples already.

For offloading, we currently only have group offloading pending (which might take a while to review and merge), but the PR is 90% ready IMO, so we can mention it -- especially because it has no speed overheads while drastically reducing memory requirements.

So, IMO we should not mention these few lines ("..., we only support the first two")

Feel free to perform those changes directly here. I would go with:

So, IMO we should not mention these few lines ("..., we only support the first two")

And make it clear what's upcoming (the ones you have opened PRs for).

@a-r-r-o-w I have taken care of the edits. LMK if that works for you.

video_gen.md

a-r-r-o-w · 2025-01-19T19:34:58Z

video_gen.md

+| VAE tiling | 43.58 GB |
+| CPU offloading | 28.87 GB |
+| 8Bit | 49.9 GB |
+| 8Bit + CPU offloading* | 35.66 GB |
+| 8Bit + VAE tiling | 36.92 GB |
+| 8Bit + CPU offloading + VAE tiling | 26.18 GB |
+| 4Bit | 42.96 GB |
+| 4Bit + CPU offloading | 21.99 GB |
+| 4Bit + VAE tiling | 26.42 GB |
+| 4Bit + CPU offloading + VAE tiling | 14.15 GB |


@sayakpaul Have we made note of the time required for each of these methods? IMO it would be helpful for users to understand the tradeoffs that come with each and the expected slowdown

It would also set the stage to tease the new banger feature, of prefetched offloading, coming soon, which uses the memory of sequential cpu offloading (so around ~3 GB) without compromising speed. CPU RAM requirements are the same as any other offloading methods. LMK what you think

Reason why I didn't because:

Video generation is time-consuming, especially HunyuanVideo. Not sure if most users care about the inference latency taking a hit because of memory optims.

We don't have the other features merged yet so, I didn't feel comfortable benchmarking them.

If you feel strongly about the timing note, feel free to add the changes.

From the Comfy community side atleast, I know that people do care about the time required and try to work with settings that reduce the overall time required (lower resolution/frames + latent upscaling, sage attention, fp8 matmul, etc. because they have support for some good memory optims already). So, I think it will be beneficial to mention time here because if we only cared about reducing memory, everyone would just default to something like sequential cpu offloading.

Could you provide me with the benchmark script from where you got the current numbers? Will run the same and measure time as well

Here:

Code

from diffusers import HunyuanVideoTransformer3DModel, HunyuanVideoPipeline from diffusers import BitsAndBytesConfig as BitsAndBytesConfig import argparse import json import torch prompt = "A cat walks on the grass, realistic. The scene resembles a real-life footage and should look as if it was shot in a sunny day." def load_pipeline(args): if args.bit4_bnb: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", quantization_config=quant_config, torch_dtype=torch.bfloat16, ) elif args.bit8_bnb: quant_config = BitsAndBytesConfig(load_in_8bit=True) transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", quantization_config=quant_config, torch_dtype=torch.bfloat16, ) else: transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained( "hunyuanvideo-community/HunyuanVideo", transformer=transformer, torch_dtype=torch.float16 ) if not args.enable_model_cpu_offload: pipe = pipe.to("cuda") else: pipe.enable_model_cpu_offload() if args.vae_tiling: pipe.vae.enable_tiling() return pipe if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--enable_model_cpu_offload", type=int, choices=[0, 1]) parser.add_argument("--vae_tiling", type=int, choices=[0, 1]) parser.add_argument("--bit4_bnb", type=int, choices=[0, 1]) parser.add_argument("--bit8_bnb", type=int, choices=[0, 1]) args = parser.parse_args() # Construct output path based on argument values output_path = f"4bit@{args.bit4_bnb}_8bit@{args.bit8_bnb}_tiling@{args.vae_tiling}_offload@{args.enable_model_cpu_offload}.json" pipe = load_pipeline(args) _ = pipe( prompt, height=512, width=768, num_frames=121, generator=torch.manual_seed(0), num_inference_steps=50 ) memory = torch.cuda.max_memory_allocated() / (1024 ** 3) # Serialize memory usage info to JSON memory_data = { "prompt": prompt, "height": 512, "width": 768, "num_frames": 121, "num_inference_steps": 50, "gpu_memory_usage_gb": memory, "enable_model_cpu_offload": args.enable_model_cpu_offload, "vae_tiling": args.vae_tiling, "bit4_bnb": args.bit4_bnb, "bit8_bnb": args.bit8_bnb } with open(output_path, "w") as json_file: json.dump(memory_data, json_file, indent=4) print(f"Serialized to {output_path=}")

I would keep the settings similar, though. If we have reduce the number of frames, resolution, etc. I'd make a separate note and not change the settings during benchmarking.

Here's the results with time required for each method + FP8-layerwise-upcasting since the PR was merged.

| **Setting** | **Memory** | **Time** | |:--------------------------------------------------:|:-------------:|:--------:| | BF16 Base | 60.10 GB | 863s | | BF16 + CPU offloading | 28.87 GB | 917s | | BF16 + VAE tiling | 43.58 GB | 870s | | 8-bit BnB | 49.90 GB | 983s | | 8-bit BnB + CPU offloading* | 35.66 GB | 1041s | | 8-bit BnB + VAE tiling | 36.92 GB | 997s | | 8-bit BnB + CPU offloading + VAE tiling | 26.18 GB | 1260s | | 4-bit BnB | 42.96 GB | 867s | | 4-bit BnB + CPU offloading | 21.99 GB | 953s | | 4-bit BnB + VAE tiling | 26.42 GB | 889s | | 4-bit BnB + CPU offloading + VAE tiling | 14.15 GB | 995s | | FP8 Upcasting | 51.70 GB | 856s | | FP8 Upcasting + CPU offloading | 21.99 GB | 983s | | FP8 Upcasting + VAE tiling | 35.17 GB | 867s | | FP8 Upcasting + CPU offloading + VAE tiling | 20.44 GB | 1013s | | BF16 + Group offload (blocks=8) + VAE tiling | 15.67 GB | 925s | | BF16 + Group offload (blocks=1) + VAE tiling | 7.72 GB | 881s | | BF16 + Group offload (leaf) + VAE tiling | 6.66 GB | 887s | | FP8 Upcasting + Group offload (leaf) + VAE tiling | 6.56 GB | 885s |

Still haven't added Groupwise-offloading yet since I had another idea about optimizing it for further reducing memory. I will for sure be able to send the numbers for it by later today. Will push the changes directly EOD

Thanks Aryan!

Here's the updated benchmark code (did not modify the original parts and just kept to the fp8 and group offloading additions)

code

from diffusers import HunyuanVideoTransformer3DModel, HunyuanVideoPipeline from diffusers import BitsAndBytesConfig import argparse import json import torch import time from diffusers.utils import export_to_video from diffusers.hooks.group_offloading import apply_group_offloading from diffusers.utils.logging import set_verbosity_debug set_verbosity_debug() prompt = "A cat walks on the grass, realistic. The scene resembles a real-life footage and should look as if it was shot in a sunny day." def load_pipeline(args): if args.bit4_bnb: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", quantization_config=quant_config, torch_dtype=torch.bfloat16, ) elif args.bit8_bnb: quant_config = BitsAndBytesConfig(load_in_8bit=True) transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", quantization_config=quant_config, torch_dtype=torch.bfloat16, ) else: transformer = HunyuanVideoTransformer3DModel.from_pretrained( "hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16 ) if args.layerwise_casting: transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16) pipe = HunyuanVideoPipeline.from_pretrained( "hunyuanvideo-community/HunyuanVideo", transformer=transformer, torch_dtype=torch.float16 ) if not args.enable_model_cpu_offload: if args.group_offloading == "0": pipe = pipe.to("cuda") else: pipe.enable_model_cpu_offload() if args.vae_tiling: pipe.vae.enable_tiling() return pipe if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--enable_model_cpu_offload", type=int, choices=[0, 1]) parser.add_argument("--vae_tiling", type=int, choices=[0, 1]) parser.add_argument("--bit4_bnb", type=int, choices=[0, 1]) parser.add_argument("--bit8_bnb", type=int, choices=[0, 1]) parser.add_argument("--layerwise_casting", type=int, choices=[0, 1]) parser.add_argument("--group_offloading", type=str, choices=["0", "1", "8", "leaf_level"]) args = parser.parse_args() # Construct output path based on argument values output_path = f"group_offloading@{args.group_offloading}_4bit@{args.bit4_bnb}_8bit@{args.bit8_bnb}_tiling@{args.vae_tiling}_offload@{args.enable_model_cpu_offload}_layerwise@{args.layerwise_casting}.json" pipe = load_pipeline(args) if args.group_offloading != "0": apply_group_offloading( pipe.text_encoder, offload_type="leaf_level", offload_device=torch.device("cpu"), onload_device=torch.device("cuda"), force_offload=True, non_blocking=True, use_stream=True, ) apply_group_offloading( pipe.text_encoder_2, offload_type="leaf_level", offload_device=torch.device("cpu"), onload_device=torch.device("cuda"), force_offload=True, non_blocking=True, use_stream=True, ) apply_group_offloading( pipe.transformer, offload_type="block_level" if args.group_offloading in ["1", "8"] else "leaf_level", num_blocks_per_group=8 if args.group_offloading == "8" else 1 if args.group_offloading == "1" else None, offload_device=torch.device("cpu"), onload_device=torch.device("cuda"), force_offload=True, non_blocking=True, use_stream=True, ) pipe.vae.to("cuda") # warmup for prefetch hooks to figure out layer execution order _ = pipe(prompt, height=64, width=64, num_frames=9, num_inference_steps=2) t1 = time.time() video = pipe( prompt, height=512, width=768, num_frames=121, generator=torch.manual_seed(0), num_inference_steps=30, ) t2 = time.time() video = video.frames[0] export_to_video(video, output_path[:-5] + ".mp4", fps=30) memory = torch.cuda.max_memory_allocated() / (1024 ** 3) # Serialize memory usage info to JSON memory_data = { "prompt": prompt, "height": 512, "width": 768, "num_frames": 121, "num_inference_steps": 50, "gpu_memory_usage_gb": memory, "inference_time": round(t2 - t1, 2), "enable_model_cpu_offload": args.enable_model_cpu_offload, "vae_tiling": args.vae_tiling, "bit4_bnb": args.bit4_bnb, "bit8_bnb": args.bit8_bnb } with open(output_path, "w") as json_file: json.dump(memory_data, json_file, indent=4) print(f"Serialized to {output_path=}")

BF16, 121 frames, 512x768 resolution in under 7 GB (further reduced to under 5 GB with flash attention and optimized feed-forward (huggingface/diffusers#10623). Did we cook or did we cook? 👨‍🍳

sayakpaul · 2025-01-20T03:00:18Z

Thanks @a-r-r-o-w!

Is there anything specific you'd like me to address? At the moment, I see some TODOs which were regarding the feature PRs not merged yet, but we are very close to merging (just needs final look from @DN6), so we can directly mention them.

I think you can take care of the comments you added and maybe make some changes to address them? I will try to address VB's comments.

I will let @DN6 take care of the code examples.

Co-authored-by: Aryan <[email protected]>

sayakpaul · 2025-01-21T04:46:38Z

video_gen.md

+export_to_video(video, "output.mp4", fps=24)
+```
+
+### Memory requirements


@Vaibhavs10 this has been adjusted FYI.

sayakpaul · 2025-01-21T04:47:37Z

video_gen.md

+* [Layerwise upcasting](https://github.com/huggingface/diffusers/pull/10347): Lets users store the params and layer outputs in a lower-precision such as `torch.float8_e4m3fn` and run computations in a higher precision such as `torch.bfloat16`. 
+* [Overlapped offloading](https://github.com/huggingface/diffusers/pull/10503): Lets users overlap data transfer with computation using CUDA streams.


@a-r-r-o-w if you could help providing your best save-up numbers here, that would be nice. For example, we could say:

Layerwise upcasting enables us to save XYZ memory.

Same for overlapped offloading.

Btw, I would not call this overlapped offloading for two reasons:

Overlapping is opt-in. It also may be imperfectly overlapped if computation is much faster to do than perform the module transfer (however, the synchronizations put in place make sure no operation starts unexpectedly)

The PRs original intention is to allow groups of internal modules to be offloaded together. This helps reduce memory peak additions caused by loading entire model to GPU, by only partially loading the required modules at a time, performing computation, and then offloading

sayakpaul · 2025-01-21T04:47:57Z

video_gen.md

+# (Full training command removed for brevity)
+```
+
+For more details, check out the repository [here](https://github.com/a-r-r-o-w/finetrainers). We used `finetrainers` to emulate the "dissolve" effect and obtained


@Vaibhavs10 provided a fine-tuned model and a result.

sayakpaul · 2025-01-21T04:48:14Z

video_gen.md

+We provide more details about these optimizations in the sections below along with some code snippets to go. But if you're already feeling excited,
+we encourage you to check out [our guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/text-img2vid).
+
+### Suite of optimizations


@DN6 if you could take care of the code, that would be helpful!

sayakpaul added 2 commits January 13, 2025 18:03

add a draft for video generation.

c8749c6

update assets.

9c29182

sayakpaul requested a review from pcuenca January 13, 2025 12:44

sayakpaul commented Jan 13, 2025

View reviewed changes

video_gen.md Outdated Show resolved Hide resolved

Vaibhavs10 reviewed Jan 14, 2025

View reviewed changes

a-r-r-o-w reviewed Jan 19, 2025

View reviewed changes

sayakpaul and others added 4 commits January 20, 2025 11:15

Update video_gen.md

29a4b5f

Co-authored-by: Aryan <[email protected]>

updates

abb427d

remove licensing table.

505c11e

finish up optims.

2491ef8

sayakpaul commented Jan 21, 2025

View reviewed changes

sayakpaul and others added 6 commits January 21, 2025 10:47

add diagram

7de022c

formatting edits.

01e2d21

updates

7471498

update

8d8a218

thumbnail

8219a52

updates

82e79ef

DN6 approved these changes Jan 27, 2025

View reviewed changes

updates.

a027c26

sayakpaul marked this pull request as ready for review January 27, 2025 14:02

sayakpaul merged commit 4e9fd55 into main Jan 27, 2025
1 check passed

sayakpaul deleted the video-gen branch January 27, 2025 14:10

		thumbnail: /blog/assets/video_gen/thumbnail.png
		date: Jan 23, 2025


		We used the same settings as above to obtain these numbers. Quantization was performed with the [`bitsandbytes` library](https://huggingface.co/docs/bitsandbytes/main/en/index) (Diffusers [supports three different quantization backends](https://huggingface.co/docs/diffusers/main/en/quantization/overview) as of now). Also note that due to numerical precision loss, quantization can impact the quality of the outputs, effects of which are more prominent in videos than images.

		## Video Generation with Diffusers

		* [Layerwise upcasting](https://github.com/huggingface/diffusers/pull/10347): Lets users store the params and layer outputs in a lower-precision such as `torch.float8_e4m3fn` and run computations in a higher precision such as `torch.bfloat16`.
		* [Overlapped offloading](https://github.com/huggingface/diffusers/pull/10503): Lets users overlap data transfer with computation using CUDA streams.

State of video generation in Diffusers #2592

State of video generation in Diffusers #2592

Conversation

sayakpaul commented Jan 13, 2025 • edited Loading

TODOs

Choose a reason for hiding this comment

Vaibhavs10 left a comment

Choose a reason for hiding this comment

Vaibhavs10 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcuenca commented Jan 14, 2025

sayakpaul commented Jan 14, 2025 • edited Loading

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jan 13, 2025 •

edited

Loading

sayakpaul commented Jan 14, 2025 •

edited

Loading

sayakpaul Jan 20, 2025 •

edited

Loading

a-r-r-o-w Jan 23, 2025 •

edited

Loading

sayakpaul commented Jan 20, 2025 •

edited

Loading

a-r-r-o-w Jan 23, 2025 •

edited

Loading