Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About multi-text encode too long on V5 #149

Open
xesdiny opened this issue Nov 15, 2024 · 1 comment
Open

About multi-text encode too long on V5 #149

xesdiny opened this issue Nov 15, 2024 · 1 comment

Comments

@xesdiny
Copy link

xesdiny commented Nov 15, 2024

rank: 6 cnt=134 sample_size=[480, 720] video_length=49 start to process:  This is an animated video showing a medium shot of a young girl with short brown hair wearing a green and white shirt. The girl is standing with her fists clenched and a frustrated expression, her face gradually becoming more sad and tearful.. The video is of high quality, and the view is very clear. High quality, masterpiece, best quality, highres, ultra-detailed, fantastic.
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['detailed, fantastic.']

Seqence Context [ CLIP-prompt_embeds(77), T5-prompt_embeds(256), Video_latent(13312)]
Emm... Regarding the beautiful prompt preprocessing, will it mainly optimize the T5 representation? Will the beautified prompt be too long in CLIP, causing the align representation to fail?

@bubbliiiing
Copy link
Collaborator

From the generated results, the benefits of long texts are greater than those of short texts. The problem you mentioned does exist, perhaps it would be better to place important information in the prompt words at the beginning of the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants