Replace Blip2 with LLava (or keep both) for creating captions #17

cwiggi01 · 2023-11-01T20:56:35Z

cwiggi01
Nov 1, 2023

I have been looking at LLaVA woudl be nice if someday it could be integrated like BLIP2 has.

LLaVA: Large Language and Vision Assistant

https://github.com/haotian-liu/LLaVA#evaluation

More info:
https://arxiv.org/abs/2304.08485

Paper which shows its comparison to BLIP2
https://arxiv.org/pdf/2310.03744.pdf

jhc13 · 2023-11-02T16:41:00Z

jhc13
Nov 2, 2023
Maintainer

Thank you for the suggestion. I just tried it out and it seems pretty good, although I did notice some hallucination.

Unfortunately, LLaVA has not yet been added to the Transformers library (unlike BLIP-2), so it is difficult to integrate it into TagGUI.
However, I noticed work being done on this issue (here and here); I will consider adding LLaVA once this is complete.

0 replies

jhc13 · 2024-01-06T18:38:21Z

jhc13
Jan 6, 2024
Maintainer

LLaVA has been added in v1.9.0.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Blip2 with LLava (or keep both) for creating captions #17

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Replace Blip2 with LLava (or keep both) for creating captions #17

cwiggi01 Nov 1, 2023

Replies: 2 comments

jhc13 Nov 2, 2023 Maintainer

jhc13 Jan 6, 2024 Maintainer

cwiggi01
Nov 1, 2023

jhc13
Nov 2, 2023
Maintainer

jhc13
Jan 6, 2024
Maintainer