does cogvideox-fun supports t2v finetuning? #109

fenghe12 · 2025-01-09T08:41:57Z

I get following error if simplly change "inpaint" training mode to "normal":
RuntimeError: Given groups=1, weight of size [1920, 33, 2, 2], expected input[2, 16, 80, 80] to have 33 channels, but got 16 channels instead
Traceback (most recent call last):
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/code/CogVideoX-Fun/scripts/train.py", line 1706, in
main()
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/code/CogVideoX-Fun/scripts/train.py", line 1559, in main
noise_pred = transformer3d(
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/code/CogVideoX-Fun/cogvideox/models/transformer3d.py", line 474, in forward
hidden_states = self.patch_embed(encoder_hidden_states, hidden_states)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/code/CogVideoX-Fun/cogvideox/models/transformer3d.py", line 67, in forward
image_embeds = self.proj(image_embeds)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/pfs-mc0p4k/tts/team/digital_avatar_group/fenghe/conda_envs/easyanimate/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1920, 33, 2, 2], expected input[6, 16, 48, 48] to have 33 channels, but got

fenghe12 · 2025-01-09T08:42:33Z

seems like still using config of "inpaint" training mode.

fenghe12 changed the title ~~dose cogvideox-fun supports t2v finetuning?~~ does cogvideox-fun supports t2v finetuning? Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does cogvideox-fun supports t2v finetuning? #109

does cogvideox-fun supports t2v finetuning? #109

fenghe12 commented Jan 9, 2025

fenghe12 commented Jan 9, 2025

does cogvideox-fun supports t2v finetuning? #109

does cogvideox-fun supports t2v finetuning? #109

Comments

fenghe12 commented Jan 9, 2025

fenghe12 commented Jan 9, 2025