Highlights
- Pro
Stars
A generative world for general-purpose robotics & embodied AI learning.
Liquid: Language Models are Scalable Multi-modal Generators
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
The official repository of "Video assistant towards large language model makes everything easy"
Fast and memory-efficient exact attention
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
An Extensible Continual Learning Framework Focused on Language Models (LMs)
OpenEQA Embodied Question Answering in the Era of Foundation Models
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
PyTorch code and models for V-JEPA self-supervised learning from video.
Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[ICLR'23 Oral] Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
An Awesome Collection of Urban Foundation Models (UFMs).
Code repository for IMU2CLIP(https//arxiv.org/pdf/2210.14395.pdf)
OpenXAI : Towards a Transparent Evaluation of Model Explanations
The champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023
EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties