Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Train using a text iterator #478

Closed
Santosh-Gupta opened this issue Oct 21, 2020 · 2 comments
Closed

Feature Request: Train using a text iterator #478

Santosh-Gupta opened this issue Oct 21, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@Santosh-Gupta
Copy link

At the moment, it looks like training can only occur using direct paths to text files. This would be tricky if we want to do some custom pre-processing, or train on text contained over a dataset.

A way to train over an iterator would allow for training in these scenarios. An example of this is the sentencepiece library, which allows for an iterator to be used

https://github.com/google/sentencepiece/tree/master/python#training-without-local-filesystem

@n1t0
Copy link
Member

n1t0 commented Oct 22, 2020

Related to #198

@n1t0 n1t0 added the enhancement New feature or request label Oct 22, 2020
@Narsil
Copy link
Collaborator

Narsil commented Nov 10, 2020

Closing in favor of #198 to keep the discussion centralised (PR open at #512 )

@Narsil Narsil closed this as completed Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants