Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree string #95

Merged
merged 25 commits into from
Oct 12, 2024
Merged

Add tree string #95

merged 25 commits into from
Oct 12, 2024

Conversation

jeffbrennan
Copy link
Contributor

@jeffbrennan jeffbrennan commented Mar 14, 2024

addresses #88

See comment below

@jeffbrennan
Copy link
Contributor Author

jeffbrennan commented Mar 18, 2024

hey @MrPowers, this is ready for review. I based the output on how pyspark prints schemas with the printSchema() function. Verified that this approach works with multiple levels of nested structs, schemas of different lengths, ignore_nullable T/F, and ignore_metadata T/F

Standard example:
image

Different lengths:
image

Multiple nested structs:
image


Some thoughts for additional features

  • Currently this auto-determines whether the output should be a table or a tree string based on whether the provided schemas have nested structs or > 10 columns. Should this be user-configurable?
  • Truncation of wide tree string outputs. If a tree string line is wider than a certain threshold (~80 characters), should I just truncate the right schema output, or both the left and right evenly?
  • Output looks good in the terminal but not in VSCode when an exception is caught. Should we just have a standard SchemasNotEqualError message and log the diff tree/table separately in the terminal?
image
  • ignore_metadata = False is a little confusing currently (see name column in example below). Should we add a text indicator to explain that two lines are differing because of their metadata?
image

@jeffbrennan jeffbrennan changed the title [WIP] Add tree string Add tree string Mar 18, 2024
Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kleemaj
Copy link

kleemaj commented Sep 23, 2024

This fix is super helpful! Any chance it can get merged? Happy to try resolving the merge conflicts if you need help.

@jeffbrennan
Copy link
Contributor Author

This fix is super helpful! Any chance it can get merged? Happy to try resolving the merge conflicts if you need help.

thanks for putting this back on my radar! I'll work on resolving the merge conflicts

@jeffbrennan
Copy link
Contributor Author

@fpgmaas @SemyonSinchenko could one of y'all please review the changes after my recent updates?

For the merge conflicts, most of the differences were formatting/line length related and I took the incoming version from main in those instances.

I also experienced some pytest failures after the merge that I have addressed in the subsequent commits.

  • reference to the six package in schema_comparer.py but not in pyproject.toml or the imports at the top of the file
  • missing import of create_schema_comparison_tree in test_schema_comparer.py
  • failure on passing a structfield to the blue() function
  • missing bcolors imports in schema_comparer.py

Let me know if anything should be changed!

@@ -3,9 +3,11 @@
import typing
from itertools import zip_longest

import six
Copy link
Collaborator

@fpgmaas fpgmaas Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need six; We can use itertools.zip_longest from the standard library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@fpgmaas
Copy link
Collaborator

fpgmaas commented Sep 25, 2024

@jeffbrennan build is failing due to formatting issues. Could you run the pre-commit hooks locally? Sorry, we should reflect this in CONTRIBUTING.md

@jeffbrennan
Copy link
Contributor Author

jeffbrennan commented Sep 25, 2024

@jeffbrennan build is failing due to formatting issues. Could you run the pre-commit hooks locally? Sorry, we should reflect this in CONTRIBUTING.md

@fpgmaas passing locally after my recent changes
image

@jeffbrennan
Copy link
Contributor Author

hi @fpgmaas, pinging you again to see if we can get this merged

@fpgmaas fpgmaas merged commit 50c2411 into MrPowers:main Oct 12, 2024
15 checks passed
@jeffbrennan jeffbrennan deleted the add-tree-string2 branch October 14, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants