Xiaotao Hu1,2*, Wei Yin2*§, Mingkai Jia1,2, Junyuan Deng1,2, Xiaoyang Guo2
Qian Zhang2, Xiaoxiao Long1†, Ping Tan1
HKUST1, Horizon Robotics2
* Equal Contribution, † Corresponding Author, § Project Leader
We present DrivingWorld (World Model for Autonomous Driving), a model that enables autoregressive video and ego state generation with high efficiency. DrivingWorld formulates the future state prediction (ego state and visions) as a next-state autoregressive style. Our DrivingWorld is able to predict over 40s videos and achieves high-fidelity controllable generation.
[Dec 2024]
Released paper, inference codes, and quick start guide.
- Hugging face demos
- Complete evaluation code
- Video preprocess code
- Training code
- 🔥 Novel Approach: GPT-style video and ego state generation.
- 🔥 State-of-the-art Performance: and long-duration driving-scene video results.
- 🔥 Controlable Generation: High-fidelity controllable generation with ego poses.
Model | Link |
---|---|
Video VQVAE | link |
World Model | link |
git clone https://github.com/YvanYin/DrivingWorld.git
cd DrivingWorld
pip3 install -r requirements.txt
- Download the pretrained models from Hugging Face, and move the pretrained parameters to
DrivingWorld/pretrained_models/*
For data preparation, please refer to video_data_preprocess.md for more details.
Script for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):
python3 tools/test_change_road_demo.py \
--config "configs/drivingworld_v1/gen_videovq_conf_demo.py" \
--exp_name "demo_dest_change_road" \
--load_path "./pretrained_models/world_model.pth" \
--save_video_path "./outputs/change_road"
Script for the default setting (conditioned on 15 frames, on demo videos, adopt topk sampling):
python3 tools/test_long_term_demo.py \
--config "configs/drivingworld_v1/gen_videovq_conf_demo.py" \
--exp_name "demo_test_long_term" \
--load_path "./pretrained_models/world_model.pth" \
--save_video_path "./outputs/long_term"
For all kinds of generation, you can change the conditional yaws and poses in the code yourself to get different outputs, and you can also modify the sampling parameters in the config files according to your needs.
If the paper and code from DrivingWorld
help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.
@article{hu2024drivingworld,
title={DrivingWorld: ConstructingWorld Model for Autonomous Driving via Video GPT},
author={Hu, Xiaotao and Yin, Wei and Jia, Mingkai and Deng, Junyuan and Guo, Xiaoyang and Zhang, Qian and Long, Xiaoxiao and Tan, Ping},
journal={arXiv preprint arXiv:2412.19505},
year={2024}
}
We thank for VQGAN, LlamaGen and LLlama 3.1 for their codebase.
This repository is under the MIT License. For more license questions, please contact Wei Yin ([email protected]) and Xiaotao Hu ([email protected]).