You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When Label Encoding a column of strings, if we input a cuDF-Pandas dataframe then cuML Label Encoder is 200x slower than inputting a pure cuDF dataframe
Steps/Code to reproduce bug
%load_ext cudf.pandas
from time import time
import random, string
import pandas as pd, numpy as np, cudf
from cuml.preprocessing import LabelEncoder
def generate_unique_strings(count, length):
chars = string.ascii_letters + string.digits
unique_strings = set()
while len(unique_strings) < count:
new_string = ''.join(random.choices(chars, k=length))
unique_strings.add(new_string)
return list(unique_strings)
unique_strings = generate_unique_strings(1000, 13)
strings = np.random.choice(unique_strings,1_000_000,replace=True)
df_cudf_pandas = pd.DataFrame(strings)
LE = LabelEncoder()
start = time()
df_cudf_pandas[0] = LE.fit_transform(df_cudf_pandas[0])
elapsed_cudf_pandas = time()-start
print(f"cuML Label Encoder with cuDF-Pandas took {elapsed_cudf_pandas:.5f} seconds")
df_cudf = cudf.DataFrame(strings)
LE = LabelEncoder()
start = time()
df_cudf[0] = LE.fit_transform(df_cudf[0])
elapsed_cudf = time()-start
print(f"cuML Label Encoder with pure cuDF took {elapsed_cudf:.5f} seconds")
slowdown = elapsed_cudf_pandas / elapsed_cudf
print(f"cuML Label Encoder is slowdowned by a factor of {slowdown:.0f}x using cuDF-Pandas")
Expected behavior
We would expect cuDF-Pandas to have same speed as pure cuDF. Not 200x slower.
Environment details (please complete the following information):
RAPIDS 24.12
The text was updated successfully, but these errors were encountered:
Describe the bug
When Label Encoding a column of strings, if we input a cuDF-Pandas dataframe then cuML Label Encoder is 200x slower than inputting a pure cuDF dataframe
Steps/Code to reproduce bug
Expected behavior
We would expect cuDF-Pandas to have same speed as pure cuDF. Not 200x slower.
Environment details (please complete the following information):
RAPIDS 24.12
The text was updated successfully, but these errors were encountered: