${BDD100k-Path}
is place of dataset which you determined.
parallel
aria2c
orwget
ffmpeg
if you want to create quickely, run job by multiprocessing
-
download dataset
bash get_data/download_videos.sh ${BDD100k-Path}
-
unzip dataset
bash get_data/unzip_videos.sh ${BDD100k-Path}
-
create directory for images which you use in training
bash get_data/mkdir_train_val_img.sh ${BDD100k-Path}
-
finally, create images using multiprocessing
Ex. create 1900/process using 37 process(machine)
1st node
bash get_data/create_img.sh ${BDD100k-Path} 1 1900
2nd node
bash get_data/create_img.sh ${BDD100k-Path} 1901 1900
:
n-th nodebash get_data/create_img.sh ${BDD100k-Path} (n-1)*1900+1 1900
:
37th nodebash get_data/create_img.sh ${BDD100k-Path} 68401 1900
if you don't mind time to create dataset, run following command
bash process_bdd.sh ${BDD100k-Path}
data structure after completing the above instructions
${BDD100k-Path}
|-- bdd100k
| |-- videos # 1.5TB
| | |-- train # 1.3TB
| | |-- val # 184GB
| |-- images # 3.5TB
| | |-- train # 3.1TB
| | |-- val # 443GB