-
Notifications
You must be signed in to change notification settings - Fork 164
Issues: huggingface/datatrove
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
MinhashBuildIndex fails with
StopIteration
when initializing priority queue
#333
opened Jan 30, 2025 by
nelson-liu
Bug: Default Adapter assumes type of metadata column in source data
#328
opened Jan 25, 2025 by
amangup
Unexpected behavior when using sentence_dedup with split_sentences=True
#324
opened Jan 18, 2025 by
ftgreat
Unexpected performance degradation behavior in minhash deduplication stage 2
#298
opened Oct 17, 2024 by
Maghoumi
Does fineweb.py perform Element and paragraph level deduplication?
#295
opened Oct 9, 2024 by
silverriver
Incorrect Job ID Extraction on Clusters with Custom Slurm Output
#265
opened Aug 12, 2024 by
StephenRebelSSC
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.