Port to LeRobot Dataset v2.0? #40

ivelin · 2024-12-06T18:54:36Z

Hi folks. Congrats on a great SOTA model!

Is there any interest in porting RDT1B to lerobot API?

I see there is a huggingface model upload, but the dataset format is not in the lerobot dataset v2.0 format.
https://huggingface.co/spaces/lerobot/visualize_dataset

I am looking at porting the data ingestion pipeline, but wanted to check if someone here is already doing that.

Thank you! 🙏🏼

csuastt · 2024-12-10T19:02:18Z

We will consider making it a TODO:)

villekuosmanen · 2024-12-30T10:04:52Z

Hey @csuastt and @ivelin.

Not sure what exactly you mean by "port to LeRobot" but I am planning to implement a mechanism to fine-tune the pre-trained RDT model using a LeRobot dataset 2.0. In practice this would mean implementing a LeRobot dataset loader similar to data/hdf5_vla_dataset.py.

Not sure exactly how much time I have for this but expecting to raise a PR within the next few weeks :) feel free to assign this issue to me

csuastt · 2025-01-01T09:01:10Z

Hi @villekuosmanen,

Thank you so much for taking the initiative to work on this! 🎉 We really appreciate your enthusiasm and willingness to contribute to the project. Your idea sounds fantastic, and we're excited to check your implementation.

No rush at all—take your time, and feel free to reach out if you have any questions or need any assistance along the way. We are eagerly looking forward to your PR! 😊

villekuosmanen · 2025-01-07T23:39:30Z

I have a hacky data integration to my flavour of LeRobot Dataset v2 working now (it's on my fork if you are interested). I will make it more robust at some point, test better, and raise a PR but until then here is a super early checkpoint to demo it learning to (sort of) control the arms: https://x.com/VilleKuosmanen/status/1876697826169647412

How many steps would you expect to fine-tune the model until it can complete tasks at ~20% accuracy? In the paper you mention this:

The model is pre-trained on 48 H100 80GB GPUs for a month, giving a total of 1M training iteration steps. It takes three days to fine-tune this model using the same GPUs for 130K steps.

Is this accurate? In my experiments so far it took around 12h on an RTX 4090 (yep I am compute poor) to reach 60k optimisation steps - is your definition of step different to an optimisation step, or do you use a very large batch size?

Also thanks for open sourcing the model and work!

LBG21 · 2025-01-08T06:16:59Z

Hi @villekuosmanen, we utilized a batch size of $32$ per H100 GPU, resulting in an effective total batch size of $32 \times 48 = 1536$ across 48 H100 GPUs. With smaller batch sizes, achieving better fine-tuning performance takes longer; however, we have not yet evaluated the results of RDT fine-tuned on an RTX 4090 (I think it should be batch size = 2).

ethan-iai added the enhancement New feature or request label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to LeRobot Dataset v2.0? #40

Port to LeRobot Dataset v2.0? #40

ivelin commented Dec 6, 2024 •

edited

Loading

csuastt commented Dec 10, 2024

villekuosmanen commented Dec 30, 2024 •

edited

Loading

csuastt commented Jan 1, 2025

villekuosmanen commented Jan 7, 2025

LBG21 commented Jan 8, 2025

Port to LeRobot Dataset v2.0? #40

Port to LeRobot Dataset v2.0? #40

Comments

ivelin commented Dec 6, 2024 • edited Loading

csuastt commented Dec 10, 2024

villekuosmanen commented Dec 30, 2024 • edited Loading

csuastt commented Jan 1, 2025

villekuosmanen commented Jan 7, 2025

LBG21 commented Jan 8, 2025

ivelin commented Dec 6, 2024 •

edited

Loading

villekuosmanen commented Dec 30, 2024 •

edited

Loading