[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

tiraldj · 2025-01-16T01:46:12Z

Describe the bug
using MultinomialNB on a medium sized dataset. the model is able to be fit to the data and score function works. but predict and fit predict doesn't work

Steps/Code to reproduce bug
(outlier index for naming, cdf is cudf dataframe, outliers['outlier_index'] is a index that goes from 0 to 8, the dependent value)

from cuml.naive_bayes import MultinomialNB
MultinomialNB_outlier_index = MultinomialNB()
MultinomialNB_outlier_index.fit(cdf, cIRQ_outliers['outlier_index'])
MultinomialNB_outlier_index.score(cdf, cIRQ_outliers['outlier_index'])

all of the above works

trial = MultinomialNB_outlier_index.predict(cdf)
predict function does not work. ditto fit_predict. So I can't see the predictions

Expected behavior
A clear and concise description of what you expected to happen.

I should be able to see the model's predictions

Here is the error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /usr/local/lib/python3.11/dist-packages/cuml/internals/array.py:676, in CumlArray.to_output(self, output_type, output_dtype, output_mem_type)
    675     else:
--> 676         return output_mem_type.xdf.Series(
    677             arr, dtype=output_dtype, index=self.index
    678         )
    679 except TypeError:

File /usr/local/lib/python3.11/dist-packages/cudf/utils/performance_tracking.py:51, in _performance_tracking.<locals>.wrapper(*args, **kwargs)
     44     stack.enter_context(
     45         nvtx.annotate(
     46             message=func.__qualname__,
   (...)
     49         )
     50     )
---> 51 return func(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/cudf/core/series.py:680, in Series.__init__(self, data, index, dtype, name, copy, nan_as_null)
    668 has_cai = (
    669     type(
    670         inspect.getattr_static(
   (...)
    674     is property
    675 )
    676 column = as_column(
    677     data,
    678     nan_as_null=nan_as_null,
    679     dtype=dtype,
--> 680     length=len(index) if index is not None else None,
    681 )
    682 if copy and has_cai:

TypeError: object of type 'builtin_function_or_method' has no len()

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 trial = MultinomialNB_outlier_index.predict(cdf)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_decorators.py:192, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    189     else:
    190         return func(*args, **kwargs)
--> 192 return cm.process_return(ret)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:290, in InternalAPIContextBase.process_return(self, ret_val)
    288 def process_return(self, ret_val):
--> 290     return self._process_obj.process_return(ret_val)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:243, in ProcessReturn.process_return(self, ret_val)
    240 def process_return(self, ret_val):
    242     for cb in self._process_return_cbs:
--> 243         ret_val = cb(ret_val)
    245     return ret_val

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:438, in ProcessReturnArray.convert_to_outputtype(self, ret_val)
    430     memory_type = self._context.root_cm.memory_type
    432 assert (
    433     output_type is not None
    434     and output_type != "mirror"
    435     and output_type != "input"
    436 ), ("Invalid root_cm.output_type: " "'{}'.").format(output_type)
--> 438 return ret_val.to_output(
    439     output_type=output_type,
    440     output_dtype=self._context.root_cm.output_dtype,
    441     output_mem_type=memory_type,
    442 )

File /usr/local/lib/python3.11/dist-packages/cuml/internals/memory_utils.py:87, in with_cupy_rmm.<locals>.cupy_rmm_wrapper(*args, **kwargs)
     85 if GPU_ENABLED:
     86     with cupy_using_allocator(rmm_cupy_allocator):
---> 87         return func(*args, **kwargs)
     88 return func(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/nvtx/nvtx.py:116, in annotate.__call__.<locals>.inner(*args, **kwargs)
    113 @wraps(func)
    114 def inner(*args, **kwargs):
    115     libnvtx_push_range(self.attributes, self.domain.handle)
--> 116     result = func(*args, **kwargs)
    117     libnvtx_pop_range(self.domain.handle)
    118     return result

File /usr/local/lib/python3.11/dist-packages/cuml/internals/array.py:680, in CumlArray.to_output(self, output_type, output_dtype, output_mem_type)
    676             return output_mem_type.xdf.Series(
    677                 arr, dtype=output_dtype, index=self.index
    678             )
    679     except TypeError:
--> 680         raise ValueError("Unsupported dtype for Series")
    681 else:
    682     raise ValueError(
    683         "Only single dimensional arrays can be transformed to"
    684         " Series."
    685     )

ValueError: Unsupported dtype for Series

Environment details (please complete the following information):

Environment location: Cloud(Paperspace)]
Linux Distro/Architecture: [Ubuntu]
GPU Model/Driver: Driver Version: 525.116.04 CUDA Version: 12.0
Method of cuDF & cuML install: [conda, Docker, or from source]
i used the quick install using pip: pip install
--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
"cudf-cu12>=25.2.0a0,<=25.2" "cuml-cu12>=25.2.0a0,<=25.2"
"cugraph-cu12>=25.2.0a0,<=25.2" "nx-cugraph-cu12>=25.2.0a0,<=25.2"
"cuspatial-cu12>=25.2.0a0,<=25.2" "cuproj-cu12>=25.2.0a0,<=25.2"
"cuxfilter-cu12>=25.2.0a0,<=25.2" "cucim-cu12>=25.2.0a0,<=25.2"
"dask-cuda>=25.2.0a0,<=25.2"

Additional context
Add any other context about the problem here.

I tried using the scikit-learn version and predict function works

The text was updated successfully, but these errors were encountered:

dantegd · 2025-01-16T15:21:29Z

Thanks for the issue @tiraldj, I think this is an issue with converting the output to cuDF, if you want a quick workaround you should be able to set the output type of cuML to anything else, for example cupy or numpy, and that should work: https://nvidia.slack.com/archives/CAL736F5W/p1737040174809019?thread_ts=1736983642.807349&cid=CAL736F5W

Regardless, we will be fixing this bug as well.

tiraldj added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

tiraldj commented Jan 16, 2025 •

edited

Loading

dantegd commented Jan 16, 2025

[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

Comments

tiraldj commented Jan 16, 2025 • edited Loading

dantegd commented Jan 16, 2025

tiraldj commented Jan 16, 2025 •

edited

Loading