-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Add seed for read files #49129
[Data] Add seed for read files #49129
Conversation
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Hey @Bye-legumes, do we still need your another PR #46306? it seems this PR is doing similar thing, right? cc @richardliaw |
HI, we dont need PR #46306 if this can be merged. This is the plan bveeramani mentioned that we use a fileshuffleconfig to unify the shuffle. |
Got it, maybe we can close the another one to avoid duplicate? thanks! |
Signed-off-by: Balaji Veeramani <[email protected]>
Closed! |
@DeveloperAPI | ||
@dataclass | ||
class FileShuffleConfig: | ||
"""Configuration for file shuffling.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring example and arg description? Maybe something like:
# (Pseudocode. You'll need to make this actually run or CI will fail)
>>> import ray
>>> import FileShuffleConfig
>>> read_parquet(...., shuffle=FileShuffleConfig(seed=42))
Also, maybe add a ..note
that even if you seed the file shuffling, you might still get a somewhat random row order since tasks complete in non-deterministic order.
Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: zhilong <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: zhilong <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: zhilong <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Close ray-project#46177 --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: zhilong <[email protected]> Signed-off-by: zhilong <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
Close #46177
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.