DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT

Xiaotao Hu^1,2*, Wei Yin^2*§, Mingkai Jia^1,2, Junyuan Deng^1,2, Xiaoyang Guo²
Qian Zhang², Xiaoxiao Long^1†, Ping Tan¹

HKUST¹, Horizon Robotics²
^* Equal Contribution, ^† Corresponding Author, ^§ Project Leader

We present DrivingWorld (World Model for Autonomous Driving), a model that enables autoregressive video and ego state generation with high efficiency. DrivingWorld formulates the future state prediction (ego state and visions) as a next-state autoregressive style. Our DrivingWorld is able to predict over 40s videos and achieves high-fidelity controllable generation.

🚀News

[Dec 2024] Released paper, inference codes, and quick start guide.

🔨 TODO LIST

Hugging face demos
Complete evaluation code
Video preprocess code
Training code

✨Hightlights

🔥 Novel Approach: GPT-style video and ego state generation.
🔥 State-of-the-art Performance: and long-duration driving-scene video results.
🔥 Controlable Generation: High-fidelity controllable generation with ego poses.

🗄️Demos

🔥 Controllable generation with provided ego poses.

🙊 Model Zoo

Model	Link
Video VQVAE	link
World Model	link

🔑 Quick Start

Installation

git clone https://github.com/YvanYin/DrivingWorld.git
cd DrivingWorld
pip3 install -r requirements.txt

Download the pretrained models from Hugging Face, and move the pretrained parameters to DrivingWorld/pretrained_models/*

Data Preparation

For data preparation, please refer to video_data_preprocess.md for more details.

Change Road Demo

Script for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):

python3 tools/test_change_road_demo.py \
--config "configs/drivingworld_v1/gen_videovq_conf_demo.py" \
--exp_name "demo_dest_change_road" \
--load_path "./pretrained_models/world_model.pth" \
--save_video_path "./outputs/change_road"

Long-term Demo

Script for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):

python3 tools/test_long_term_demo.py \
--config "configs/drivingworld_v1/gen_videovq_conf_demo.py" \ 
--exp_name "demo_test_long_term" \
--load_path "./pretrained_models/world_model.pth" \
--save_video_path "./outputs/long_term"

Personalized Generation

For all kinds of generation, you can change the conditional yaws and poses in the code yourself to get different outputs, and you can also modify the sampling parameters in the config files according to your needs.

📌 Citation

If the paper and code from DrivingWorld help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@article{hu2024drivingworld,
  title={DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT},
  author={Hu, Xiaotao and Yin, Wei and Jia, Mingkai and Deng, Junyuan and Guo, Xiaoyang and Zhang, Qian and Long, Xiaoxiao and Tan, Ping},
  journal={arXiv preprint arXiv:2412.19505},
  year={2024}
}

Reference

We thank for VQGAN, LlamaGen and LLlama 3.1 for their codebase.

License

This repository is under the MIT License. For more license questions, please contact Wei Yin ([email protected]) and Xiaotao Hu ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
configs/drivingworld_v1		configs/drivingworld_v1
datasets		datasets
images		images
models		models
modules/tokenizers		modules/tokenizers
tools		tools
utils		utils
video_data_preprocess		video_data_preprocess
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT

🚀News

🔨 TODO LIST

✨Hightlights

🗄️Demos

🙊 Model Zoo

🔑 Quick Start

Installation

Data Preparation

Change Road Demo

Long-term Demo

Personalized Generation

📌 Citation

Reference

License

About

Releases

Packages

Contributors 2

Languages

License

YvanYin/DrivingWorld

Folders and files

Latest commit

History

Repository files navigation

DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT

🚀News

🔨 TODO LIST

✨Hightlights

🗄️Demos

🙊 Model Zoo

🔑 Quick Start

Installation

Data Preparation

Change Road Demo

Long-term Demo

Personalized Generation

📌 Citation

Reference

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages