Fix cuml parameter setting issues in UMAP/DBSCAN #751
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixed an issue where cuml parameters in UMAP were not being set due to a silent failure in
_set_param
, where these parameters could not be found in the_param_mapping
. Mapping these params to themselves fixes this issue, as it is done in KNN.DBSCAN had the same issue where cuml parameters were never set, but got away with this by using the Spark API to retrieve parameters and avoiding the use of
cuml_params
dict entirely.Changes:
_param_mapping
.cuml_params
dictionary for initialization to be consistent with the other algorithms._param_mapping
description to avoid future confusion for cuml algorithms without a PySpark equivalent.cuml_params
attribute, and test under non-default cuml parameter settings.This PR also resolves Issue #749, as
random_state
is now properly set, making dataframe sampling deterministic.