Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load LMDB/TFRecords file into pytorch datasets #48

Closed
ayushkarnawat opened this issue Jan 23, 2020 · 1 comment · Fixed by #52
Closed

Load LMDB/TFRecords file into pytorch datasets #48

ayushkarnawat opened this issue Jan 23, 2020 · 1 comment · Fixed by #52
Assignees
Labels
enhancement Improvement to existing feature or code

Comments

@ayushkarnawat
Copy link
Owner

ayushkarnawat commented Jan 23, 2020

For efficient loading when training using pytorch models, it is recommended that we use torch.utils.data.dataloader class for loading batched data on-the-fly when training. To do so, we need to convert the saved dataset into a data loader class that the load_dataset() method can use (see below).

def load_dataset(method, mutator_fmt, labels, rootdir='data/3gb1/processed/',
num_data=-1, filetype='h5', as_numpy=False) -> Union[np.ndarray, List[np.ndarray]]:

Related to #35 and #40.

@ayushkarnawat ayushkarnawat added the enhancement Improvement to existing feature or code label Jan 23, 2020
@ayushkarnawat ayushkarnawat self-assigned this Jan 23, 2020
@ayushkarnawat
Copy link
Owner Author

For now, TFRecords files will not be loaded by pytorch as it requires using tensorflow to read the file, which is (a) cumbersome, and (b) defeats the purpose of using just one backend type (aka either pytorch or tensorflow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to existing feature or code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant