[LogisticRegression] Match Spark CPU behaviors when dataset has one label #531

lijinf2 · 2023-12-08T19:48:26Z

No description provided.

Signed-off-by: Jinfeng <[email protected]>

lijinf2 · 2023-12-08T19:54:03Z

build

eordentlich · 2023-12-08T23:37:30Z

python/src/spark_rapids_ml/classification.py

+                if len(logistic_regression.classes_) == 1:
+                    if init_parameters["fit_intercept"] is True:
+                        model["coef_"] = [[0.0] * logistic_regression.n_cols]
+                        model["intercept_"] = [float("inf")]


Would the sign of this depend on the label value?

Revised to support -inf for label 0.

eordentlich · 2023-12-08T23:38:08Z

python/src/spark_rapids_ml/classification.py

+        if len(result["classes_"]) == 1:
+            if self.getFitIntercept() is False:
+                print(
+                    "WARNING: All labels belong to a single class and fitIntercept=false. It's a dangerous ground, so the algorithm may not converge."


Can we match spark's warning?

Is there a way to capture spark scala warning in python? I tried caplog.set_level() to INFO, WARN, CRITICAL but got empty log text.

Revised to use logger.warning

eordentlich · 2023-12-08T23:38:20Z

python/src/spark_rapids_ml/classification.py

+                )
+            else:
+                print(
+                    "WARNING: All labels are the same value and fitIntercept=true, so the coefficients will be zeros. Training is not needed."


python/tests/test_logistic_regression.py

eordentlich · 2023-12-08T23:59:30Z

python/tests/test_logistic_regression.py

+            assert blor_model.intercept == 0.0
+        else:
+            assert array_equal(blor_model.coefficients.toArray(), [0, 0], 0.0)
+            assert blor_model.intercept == float("inf")


Maybe check what happens also in the case if all labels are 0 instead of 1 (i.e. y).

eordentlich · 2023-12-09T01:53:27Z

python/src/spark_rapids_ml/classification.py

+                    class_val = logistic_regression.classes_[0]
+                    assert (
+                        class_val == 1.0 or class_val == 0.0
+                    ), "class value must be either 1. or 0. when dataset has one label"


What does spark do if label has one value but is not 1 or 0?

Revised.
if label < 0, a java runtimeError pops up.
If label > 1, spark trains a multinomial classification, cuml trains a single-class classification due to using y.unique().

lijinf2 · 2023-12-13T23:31:31Z

build

eordentlich · 2023-12-15T18:40:34Z

python/tests/test_logistic_regression.py

+        blor_model = blor.fit(bdf)
+
+        if fit_intercept is False:
+            if label == 1.0:


Can caplog be used to check warning in this case? Like here: https://github.com/NVIDIA/spark-rapids-ml/blob/branch-23.12/python/tests/test_nearest_neighbors.py#L47

eordentlich · 2023-12-27T18:38:44Z

Any update on this?

eordentlich · 2023-12-27T20:38:11Z

You will probably need to patch the ci docker image in this pr to get tests to pass as rapidsai-nightly no longer has cuml 23.12. switch to rapidsai channel.

lijinf2 · 2023-12-27T22:00:10Z

build

lijinf2 · 2023-12-27T22:06:38Z

You will probably need to patch the ci docker image in this pr to get tests to pass as rapidsai-nightly no longer has cuml 23.12. switch to rapidsai channel.

Added the caplog, and a test case to check invalid label. Just updated ci docker image and yes seems ci can run.

lijinf2 · 2023-12-27T23:42:45Z

build

eordentlich

👍

lijinf2 force-pushed the lr_onelabel branch from 0fc7baf to 4d5dd40 Compare December 8, 2023 19:49

support one_label

3bb0d83

Signed-off-by: Jinfeng <[email protected]>

lijinf2 force-pushed the lr_onelabel branch from 4d5dd40 to 3bb0d83 Compare December 8, 2023 19:51

eordentlich reviewed Dec 9, 2023

View reviewed changes

revise PR

094481a

eordentlich reviewed Dec 9, 2023

View reviewed changes

added label_val < 0, label_val > 1, and logging

427dcf2

eordentlich reviewed Dec 15, 2023

View reviewed changes

add caplog to caputre warning message

d21f64c

lijinf2 added 2 commits December 27, 2023 13:53

add test_compat_wrong_label

a3b4185

update rapidsai-nightly channel to rapidsai

5f1e494

fix ci failure due to class_val variable is a nparray instead of float.

ae63854

eordentlich approved these changes Dec 28, 2023

View reviewed changes

lijinf2 merged commit 215e623 into NVIDIA:branch-23.12 Dec 28, 2023
2 checks passed

NvTimLiu mentioned this pull request Dec 28, 2023

Update cuml version to 24.02 in ci/Dockerfile #534

Closed

lijinf2 deleted the lr_onelabel branch March 6, 2024 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LogisticRegression] Match Spark CPU behaviors when dataset has one label #531

[LogisticRegression] Match Spark CPU behaviors when dataset has one label #531

lijinf2 commented Dec 8, 2023

lijinf2 commented Dec 8, 2023

eordentlich Dec 8, 2023

lijinf2 Dec 9, 2023

eordentlich Dec 8, 2023

lijinf2 Dec 9, 2023

lijinf2 Dec 13, 2023

eordentlich Dec 8, 2023

lijinf2 Dec 13, 2023

eordentlich Dec 8, 2023

lijinf2 Dec 9, 2023

eordentlich Dec 9, 2023

lijinf2 Dec 13, 2023

lijinf2 commented Dec 13, 2023

eordentlich Dec 15, 2023

eordentlich commented Dec 27, 2023

eordentlich commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

eordentlich left a comment

[LogisticRegression] Match Spark CPU behaviors when dataset has one label #531

[LogisticRegression] Match Spark CPU behaviors when dataset has one label #531

Conversation

lijinf2 commented Dec 8, 2023

lijinf2 commented Dec 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lijinf2 commented Dec 13, 2023

Choose a reason for hiding this comment

eordentlich commented Dec 27, 2023

eordentlich commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

lijinf2 commented Dec 27, 2023

eordentlich left a comment

Choose a reason for hiding this comment