-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrays with null values are written as empty tags on the XML file #692
Comments
I don't know if one is right-er than the other. They are slighly different situations: a child with nothing in it, vs a parent with no children. That said I don't think the current behavior is strongly motivated, just how it happened. I would probably not change behavior at this point unless it's demonstrably problematic. |
Hi Srowen, thanks for your fast reply. I get the same behaviour with the fields (if a field on the df is null will be not printed in the xml file)so i was expecting the same for a empty array (or at least an option for it?) |
I think there's a difference between |
Yes, I agree with you that an empty array is different from a None (so indeed, I would not change the default behavior). However, for big data purposes, having an option to print or not print empty nested arrays would be really helpful because it optimizes the size of the XML file. For example, in my case, I get 2-3 level nested data frames, and the results are all these empty tags for the arrays in a 100GB file. The result is something like this for each row:
|
Im using the library on a nested dataframe ex:
this is my schema:
This my data:
What would i expect would be somthing like:
But i get :
Did someone find the same issue? Is there a way to get the behaviour i want ? i tried with .option("ignoreNullFields", "true") but i get the same described above
The text was updated successfully, but these errors were encountered: