Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore_schema in assert_df_equality removed in 9.3? #69

Open
MathiasHolmstrom opened this issue Sep 18, 2023 · 6 comments
Open

ignore_schema in assert_df_equality removed in 9.3? #69

MathiasHolmstrom opened this issue Sep 18, 2023 · 6 comments

Comments

@MathiasHolmstrom
Copy link

I used this parameter in 9.2 but it's no longer there in 9.3. Why was this removed and does it mean I can't perform unit-tests without comparing types any longer?

@MrPowers
Copy link
Owner

Yea, we had to remove this because it was a bad addition to the library (it didn't make sense after I thought about it deeper). Can you give me a better idea of what you're trying to accomplish, so I can see if it's possible with chispa or if the library should be modified? Thank you.

@MathiasHolmstrom
Copy link
Author

If I am comparing two dataframes and don't care about the types of the columns. In that case I want the assert dataframes to pass even if the types are different. Is there another way of accomplishing this behavior?

@MrPowers
Copy link
Owner

@Hiderdk - yea this should work: chispa.assert_basic_rows_equality(df1.collect(), df2.collect()). Let me know if that works for you.

@ivanychev
Copy link

@MrPowers first of all, thanks for the wonderful library. Why did you decide to change the API of this assertion in the minor version bump of the package?

This caused our tests to break, the convention is to rely on the fact that the minor version bumps don't change the API and thus package managers (like poetry) update the version of the dips to the latest minor version.

@ivanychev
Copy link

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality. Without it, we will need to conduct some boilerplate type casting in the test code to make tests work again. It's a petty that you decided to remove it.

@MrPowers
Copy link
Owner

@ivanychev - yea, I have the work-around that will meet your use case above.

Why did you decide to change the API of this assertion in the minor version bump of the package?

We're using Semantic Versioning 2.0. Per the spec: "Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."

It's a petty that you decided to remove it.

No, this wasn't petty. This option was causing bugs and breaking workflows. We needed to remove it. I do my best to make all changes backwards compatible. This one absolutely needed to be removed cause it was causing lots of issues.

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality

Feel free to propose another abstraction that's not breaking, not buggy, and will be a good addition for the entire chispa community 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants