You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rank: 6 cnt=134 sample_size=[480, 720] video_length=49 start to process: This is an animated video showing a medium shot of a young girl with short brown hair wearing a green and white shirt. The girl is standing with her fists clenched and a frustrated expression, her face gradually becoming more sad and tearful.. The video is of high quality, and the view is very clear. High quality, masterpiece, best quality, highres, ultra-detailed, fantastic.
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['detailed, fantastic.']
Seqence Context [ CLIP-prompt_embeds(77), T5-prompt_embeds(256), Video_latent(13312)]
Emm... Regarding the beautiful prompt preprocessing, will it mainly optimize the T5 representation? Will the beautified prompt be too long in CLIP, causing the align representation to fail?
The text was updated successfully, but these errors were encountered:
From the generated results, the benefits of long texts are greater than those of short texts. The problem you mentioned does exist, perhaps it would be better to place important information in the prompt words at the beginning of the text.
Seqence Context [ CLIP-prompt_embeds(77), T5-prompt_embeds(256), Video_latent(13312)]
Emm... Regarding the beautiful prompt preprocessing, will it mainly optimize the T5 representation? Will the beautified prompt be too long in CLIP, causing the align representation to fail?
The text was updated successfully, but these errors were encountered: