PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects #5

saichaitanyamolabanti · 2022-05-04T05:41:01Z

I wanted to try out this package, because this implements pyspark version of shapley value generations.
So, I just copy pasted "simple.ipynb" file into my environment to just observe everything basic is working alright or not, but able to see code is breaking at input cell [32]. Attached are the screenshots, could anyone please look into them?

saichaitanyamolabanti · 2022-05-05T04:13:35Z

@ijoseph @kevinwang @variablenix @prasad-kamat please help

ijoseph · 2022-05-09T00:11:14Z

Wow, we really should have pinned (and pip compileed, too) our requirements file below. Let me see if I can get something working and try to update the below.
https://github.com/Affirm/shparkley/blob/master/examples/requirements.txt

ijoseph · 2022-05-09T00:46:45Z

Alright, @saichaitanyamolabanti can you please try to pull this #7 PR, then pip install -r examples/macos-py3.10-requirements.txt if you happen to have macOS and an empty python 3.10 environment, pip install -r examples/requirements.in otherwise? That particular set of third-party requirements worked for me.

saichaitanyamolabanti · 2022-05-09T06:41:16Z

Hey @ijoseph, I've noticed to install few libraries as per your comments and began installing them, mainly the install and import of cloudpickle. Here are my observations, I can still find some errors, please help !!

Scenario-1
import cloudpickle
#import pyspark.serializers
#pyspark.serializers.cloudpickle = cloudpickle

then row = dataset.filter(dataset.xxxx == '5').rdd.first() is working fine

Scenario-2:
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle
then row = dataset.filter(dataset.xxxx == '5').rdd.first() is throwing below error

saichaitanyamolabanti · 2022-05-09T06:42:56Z

then tried to pull those import of cloudpicklet and spark.serializers down below the investigation row like:
row = dataset.filter(dataset.xxxx == '5').rdd.first()
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle

but, still able to see error like - cloudpickle doesn't have the method 'print_exec'

saichaitanyamolabanti · 2022-05-09T07:21:53Z

@ijoseph Or you can consider this scenario:
I've tried the same simpl.ipynb example by installing 'cloud pickle' library and by also importing pyspark.serializers and setting up with cloudpickle like below:
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle

getting this error, please help !!:

saichaitanyamolabanti · 2022-05-19T15:44:58Z

@ijoseph @kevinwang @variablenix @prasad-kamat any help ?

m-aciek · 2022-07-05T10:36:33Z

Isn't it this issue? It looks like it's solved in pyspark 3.0.0 (PR). So maybe it would be enough to set the lower bound for pyspark dependency in setup.py?

REQUIRED_PACKAGES = [
    …,
    'pyspark>=3.0.0',
]

ijoseph mentioned this issue May 9, 2022

Pin and compile requirements #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects #5

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects #5

saichaitanyamolabanti commented May 4, 2022

saichaitanyamolabanti commented May 5, 2022

ijoseph commented May 9, 2022

ijoseph commented May 9, 2022 •

edited

Loading

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 19, 2022

m-aciek commented Jul 5, 2022 •

edited

Loading

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects #5

PicklingError: Could not serialize object: TypeError: can't pickle _abc_data objects #5

Comments

saichaitanyamolabanti commented May 4, 2022

saichaitanyamolabanti commented May 5, 2022

ijoseph commented May 9, 2022

ijoseph commented May 9, 2022 • edited Loading

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 9, 2022

saichaitanyamolabanti commented May 19, 2022

m-aciek commented Jul 5, 2022 • edited Loading

ijoseph commented May 9, 2022 •

edited

Loading

m-aciek commented Jul 5, 2022 •

edited

Loading