You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to explore SEVIR dataset and I wonna generate a dataset with make_dataset.py.
I have modified make_dataset.py and I split each event into 2 trainig samples.
I have run make_dataset.py, and I have got files named as nowcast_training_000.h5, nowcast_testing_000.h5, ..., nowcast_training_008.h5, nowcast_training_008.h5, and their corresponding xxx_META.csv files. (I remain the parameter "n_chunks" as the default value 8).
However, I don't understand the relations in these files, and I have the following confusions,
Is the data in xxx_000.h5 the same as that in xxx_001.h5 and others but with different data orders, or the data in xxx_000.h5 is not the same as that in xxx_001.h5 and others.
Should I use one of the file pairs for training and testing (such as nowcast_training_000.h5 for training and nowcast_testinging_000.h5 for testing ), or using all of the files, or setting the parameter "append" to "True" to write the 8 chunks into 1 training file and 1 testing file ?
Thanks in advance !
The text was updated successfully, but these errors were encountered:
The data in each file is not the same. Each file contains a collection of samples drawn from the full SEVIR dataset created using the load_batches method. Spreading the data across multiple files prevents the script from writing one giant file. Use the append option if you are okay with one large file.
How you use the file pairs depends. If you have limited RAM, but want to use all the data, you could load one file at a time and process them sequentially. Or, if you have access to multiple GPUs, you could assign each file to a different GPU.
The data in each file is not the same. Each file contains a collection of samples drawn from the full SEVIR dataset created using the load_batches method. Spreading the data across multiple files prevents the script from writing one giant file. Use the append option if you are okay with one large file.
How you use the file pairs depends. If you have limited RAM, but want to use all the data, you could load one file at a time and process them sequentially. Or, if you have access to multiple GPUs, you could assign each file to a different GPU.
Hopefully that clears things up
Thanks very much ! I have understood through your clear statement.
Hi, nice work !
I'm trying to explore SEVIR dataset and I wonna generate a dataset with make_dataset.py.
I have modified make_dataset.py and I split each event into 2 trainig samples.
I have run make_dataset.py, and I have got files named as nowcast_training_000.h5, nowcast_testing_000.h5, ..., nowcast_training_008.h5, nowcast_training_008.h5, and their corresponding xxx_META.csv files. (I remain the parameter "n_chunks" as the default value 8).
However, I don't understand the relations in these files, and I have the following confusions,
Is the data in xxx_000.h5 the same as that in xxx_001.h5 and others but with different data orders, or the data in xxx_000.h5 is not the same as that in xxx_001.h5 and others.
Should I use one of the file pairs for training and testing (such as nowcast_training_000.h5 for training and nowcast_testinging_000.h5 for testing ), or using all of the files, or setting the parameter "append" to "True" to write the 8 chunks into 1 training file and 1 testing file ?
Thanks in advance !
The text was updated successfully, but these errors were encountered: