About multi-text encode too long on V5 #149

xesdiny · 2024-11-15T10:56:55Z

rank: 6 cnt=134 sample_size=[480, 720] video_length=49 start to process:  This is an animated video showing a medium shot of a young girl with short brown hair wearing a green and white shirt. The girl is standing with her fists clenched and a frustrated expression, her face gradually becoming more sad and tearful.. The video is of high quality, and the view is very clear. High quality, masterpiece, best quality, highres, ultra-detailed, fantastic.
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['detailed, fantastic.']

Seqence Context [ CLIP-prompt_embeds(77), T5-prompt_embeds(256), Video_latent(13312)]
Emm... Regarding the beautiful prompt preprocessing, will it mainly optimize the T5 representation? Will the beautified prompt be too long in CLIP, causing the align representation to fail?

The text was updated successfully, but these errors were encountered:

bubbliiiing · 2024-11-18T08:49:47Z

From the generated results, the benefits of long texts are greater than those of short texts. The problem you mentioned does exist, perhaps it would be better to place important information in the prompt words at the beginning of the text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About multi-text encode too long on V5 #149

About multi-text encode too long on V5 #149

xesdiny commented Nov 15, 2024

bubbliiiing commented Nov 18, 2024

About multi-text encode too long on V5 #149

About multi-text encode too long on V5 #149

Comments

xesdiny commented Nov 15, 2024

bubbliiiing commented Nov 18, 2024