Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chispa 1.0 release #93

Open
MrPowers opened this issue Feb 19, 2024 · 5 comments
Open

chispa 1.0 release #93

MrPowers opened this issue Feb 19, 2024 · 5 comments

Comments

@MrPowers
Copy link
Owner

MrPowers commented Feb 19, 2024

It would be nice to develop chispa so we can make a 1.0 release.

We might even want to expose a different interface. Something like this:

@dataclass
class MyFormats:
    mismatched_rows = ["light_yellow"]
    matched_rows = ["cyan", "bold"]
    mismatched_cells = ["purple"]
    matched_cells = ["blue"]

my_chispa = Chispa(formats=MyFormats())

my_chispa.assert_df_equality(actual_df, expected_df)

The user could inject the my_chispa object in their tests as follows:

@pytest.fixture()
def my_chispa():
    return Chispa(formats=MyFormats())

def test_shows_assert_basic_rows_equality(my_chispa):
  ...
  my_chispa.assert_basic_rows_equality(df1.collect(), df2.collect())

It's worth contemplating at least.

@MrPowers
Copy link
Owner Author

Let's brainstorm some of the "big issues" with chispa:

  • bad for wide table DataFrame comparisons
  • doesn't handle some column types well
  • probably doesn't handle some edge cases well (e.g. array columns with NaN values)
  • user can't customize formatting
  • some bad abstractions (e.g the underline_cells argument)
  • Users can't disable terminal characters (sometimes users want to use this in a notebook and don't want any Terminal formatting output)

Here are some project goals:

  • always maintain backward compatibility whenever possible
  • output beautiful error messages and make it easier for users to unit test their PySpark code
  • allow users to run unit tests in a performant manner

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

@SemyonSinchenko
Copy link
Collaborator

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

Why not to have a new API, but do not delete an old one, only raise DeprecationWarnings? Or even just create a chispa.v2 API.

@MrPowers
Copy link
Owner Author

Yep, I already started building that new interface with Chispa(formats=MyFormats()). We may want to expose the public API via Chispa going forward. I think we just need to figure out exactly the public interface that we want to expose to end users. The public interface should meet all the project goals, should be flexible enough to allow for customizations, and should be easy to run with the defaults.

@fpgmaas
Copy link
Collaborator

fpgmaas commented Jul 19, 2024

user can't customize formatting

I already started building that new interface with Chispa(formats=MyFormats()). [...]

@MrPowers For a proposed new way of formatting configuration, see #127 which would change that for users to e.g.

Chispa(
    formats=FormattingConfig(
        mismatched_rows={"color": "light_yellow"}
    )
)

@fpgmaas
Copy link
Collaborator

fpgmaas commented Jul 19, 2024

I think the best way to move forward is to simply create separate issues for the following topics:

bad for wide table DataFrame comparisons
doesn't handle some column types well
probably doesn't handle some edge cases well (e.g. array columns with NaN values)
user can't customize formatting
some bad abstractions (e.g the underline_cells argument)
Users can't disable terminal characters (sometimes users want to use this in a notebook and don't want any Terminal formatting output)

So we can discuss them separately. We add them to the milestone for a 1.0 release. We release features and changes one-by-one by incrementing the minor version, and when all desired changes and features for the 1.0 release are finished, we release it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants