Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuml's Naive Bayes' MultinomialNB function can't predict or fit_predict #6228

Open
tiraldj opened this issue Jan 16, 2025 · 1 comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@tiraldj
Copy link

tiraldj commented Jan 16, 2025

Describe the bug
using MultinomialNB on a medium sized dataset. the model is able to be fit to the data and score function works. but predict and fit predict doesn't work

Steps/Code to reproduce bug
(outlier index for naming, cdf is cudf dataframe, outliers['outlier_index'] is a index that goes from 0 to 8, the dependent value)

from cuml.naive_bayes import MultinomialNB
MultinomialNB_outlier_index = MultinomialNB()
MultinomialNB_outlier_index.fit(cdf, cIRQ_outliers['outlier_index'])
MultinomialNB_outlier_index.score(cdf, cIRQ_outliers['outlier_index'])

all of the above works

trial = MultinomialNB_outlier_index.predict(cdf)
predict function does not work. ditto fit_predict. So I can't see the predictions

Expected behavior
A clear and concise description of what you expected to happen.

I should be able to see the model's predictions

Here is the error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /usr/local/lib/python3.11/dist-packages/cuml/internals/array.py:676, in CumlArray.to_output(self, output_type, output_dtype, output_mem_type)
    675     else:
--> 676         return output_mem_type.xdf.Series(
    677             arr, dtype=output_dtype, index=self.index
    678         )
    679 except TypeError:

File /usr/local/lib/python3.11/dist-packages/cudf/utils/performance_tracking.py:51, in _performance_tracking.<locals>.wrapper(*args, **kwargs)
     44     stack.enter_context(
     45         nvtx.annotate(
     46             message=func.__qualname__,
   (...)
     49         )
     50     )
---> 51 return func(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/cudf/core/series.py:680, in Series.__init__(self, data, index, dtype, name, copy, nan_as_null)
    668 has_cai = (
    669     type(
    670         inspect.getattr_static(
   (...)
    674     is property
    675 )
    676 column = as_column(
    677     data,
    678     nan_as_null=nan_as_null,
    679     dtype=dtype,
--> 680     length=len(index) if index is not None else None,
    681 )
    682 if copy and has_cai:

TypeError: object of type 'builtin_function_or_method' has no len()

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[54], line 1
----> 1 trial = MultinomialNB_outlier_index.predict(cdf)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_decorators.py:192, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    189     else:
    190         return func(*args, **kwargs)
--> 192 return cm.process_return(ret)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:290, in InternalAPIContextBase.process_return(self, ret_val)
    288 def process_return(self, ret_val):
--> 290     return self._process_obj.process_return(ret_val)

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:243, in ProcessReturn.process_return(self, ret_val)
    240 def process_return(self, ret_val):
    242     for cb in self._process_return_cbs:
--> 243         ret_val = cb(ret_val)
    245     return ret_val

File /usr/local/lib/python3.11/dist-packages/cuml/internals/api_context_managers.py:438, in ProcessReturnArray.convert_to_outputtype(self, ret_val)
    430     memory_type = self._context.root_cm.memory_type
    432 assert (
    433     output_type is not None
    434     and output_type != "mirror"
    435     and output_type != "input"
    436 ), ("Invalid root_cm.output_type: " "'{}'.").format(output_type)
--> 438 return ret_val.to_output(
    439     output_type=output_type,
    440     output_dtype=self._context.root_cm.output_dtype,
    441     output_mem_type=memory_type,
    442 )

File /usr/local/lib/python3.11/dist-packages/cuml/internals/memory_utils.py:87, in with_cupy_rmm.<locals>.cupy_rmm_wrapper(*args, **kwargs)
     85 if GPU_ENABLED:
     86     with cupy_using_allocator(rmm_cupy_allocator):
---> 87         return func(*args, **kwargs)
     88 return func(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/nvtx/nvtx.py:116, in annotate.__call__.<locals>.inner(*args, **kwargs)
    113 @wraps(func)
    114 def inner(*args, **kwargs):
    115     libnvtx_push_range(self.attributes, self.domain.handle)
--> 116     result = func(*args, **kwargs)
    117     libnvtx_pop_range(self.domain.handle)
    118     return result

File /usr/local/lib/python3.11/dist-packages/cuml/internals/array.py:680, in CumlArray.to_output(self, output_type, output_dtype, output_mem_type)
    676             return output_mem_type.xdf.Series(
    677                 arr, dtype=output_dtype, index=self.index
    678             )
    679     except TypeError:
--> 680         raise ValueError("Unsupported dtype for Series")
    681 else:
    682     raise ValueError(
    683         "Only single dimensional arrays can be transformed to"
    684         " Series."
    685     )

ValueError: Unsupported dtype for Series

Environment details (please complete the following information):

  • Environment location: Cloud(Paperspace)]

  • Linux Distro/Architecture: [Ubuntu]

  • GPU Model/Driver: Driver Version: 525.116.04 CUDA Version: 12.0

  • Method of cuDF & cuML install: [conda, Docker, or from source]
    i used the quick install using pip: pip install
    --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
    "cudf-cu12>=25.2.0a0,<=25.2" "cuml-cu12>=25.2.0a0,<=25.2"
    "cugraph-cu12>=25.2.0a0,<=25.2" "nx-cugraph-cu12>=25.2.0a0,<=25.2"
    "cuspatial-cu12>=25.2.0a0,<=25.2" "cuproj-cu12>=25.2.0a0,<=25.2"
    "cuxfilter-cu12>=25.2.0a0,<=25.2" "cucim-cu12>=25.2.0a0,<=25.2"
    "dask-cuda>=25.2.0a0,<=25.2"

Additional context
Add any other context about the problem here.

I tried using the scikit-learn version and predict function works

@tiraldj tiraldj added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 16, 2025
@dantegd
Copy link
Member

dantegd commented Jan 16, 2025

Thanks for the issue @tiraldj, I think this is an issue with converting the output to cuDF, if you want a quick workaround you should be able to set the output type of cuML to anything else, for example cupy or numpy, and that should work: https://nvidia.slack.com/archives/CAL736F5W/p1737040174809019?thread_ts=1736983642.807349&cid=CAL736F5W

Regardless, we will be fixing this bug as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants