Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe() fixes #937

Merged
merged 18 commits into from
Nov 21, 2024
Merged

describe() fixes #937

merged 18 commits into from
Nov 21, 2024

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Oct 29, 2024

Fixes #558 : Describe breaks on Number column

I'll first fix some of the most annoying bugs which can be found with describe(), more changes or fixes will come later.

Fixed:

  • created col.isInterComparable() instead of col.isComparable() to avoid confusion
  • cumsum:
    • added Short, Byte, BigInteger, Nothing support
    • (Short/Byte are supposed to become Int, similar to sum(), but they are temporarily converted back to Short/Byte so that the DataColumn.cumSum function returns the right type)
    • added extra tests
  • mean:
    • added BigInteger support
    • added TODOs for BigDecimal and BigInteger, they're supposed to result in BigDecimal, not Double
  • median:
    • added Float, Short, BigInteger support
    • Number support is still missing as it's not Comparable, will be handled later
    • added TODOs for conversion, will happen later
  • min/max:
    • removed unused internal code
    • Number support is still missing as it's not Comparable, will be handled later
  • std:
    • added BigInteger, Number support
    • added test for Byte
    • BigDecimal and BigInteger are supposed to result in BigDecimal, not Double, will be handled later
  • sum:
    • added Float, Short, Byte, BigInteger, Number, Nothing support
    • added some tests
    • conversions will be handled later
  • describe:
    • fixed @DataSchema type being String instead of KType
    • some cleaning up in the implementation
    • min, median, and max now also show up for Number columns with different types :) Numbers will be converted to double or BigDecimal first so they can be compared.

@Jolanrensen Jolanrensen force-pushed the statistics-fixes branch 2 times, most recently from dc4687d to 6e028f1 Compare November 7, 2024 13:56
…ng them to either double or bigdecimal) and added tests
@Jolanrensen Jolanrensen changed the title Statistics fixes describe() fixes Nov 8, 2024
@Jolanrensen Jolanrensen marked this pull request as ready for review November 15, 2024 12:20
@Jolanrensen Jolanrensen requested review from zaleslaw and koperagen and removed request for zaleslaw November 15, 2024 12:20
@koperagen
Copy link
Collaborator

Could you please add a test for #352? Maybe even if something else covers it, good to have one that is closer to use case

*/
public fun AnyCol.isComparable(): Boolean =
public fun AnyCol.isInterComparable(): Boolean =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isInterComparable is a new / not widespread term, for me it raises some questions like what's the difference (if i were to see it for the first time)
For users isComparable can be described as "filter columns for which max(), min(), median() and other (we actually only have 3) order operations make sense"
You wouldn't expect to find max() value for mixed column of String and Double values, so it should make things clearer

Copy link
Collaborator Author

@Jolanrensen Jolanrensen Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's actually the reason I changed it, because for me isComparable can be interpreted in two ways:

  • "is comparable": so can call max(), min(), etc., not true for mixed number columns
  • "is Comparable"; implements the interface, which would be true for mixed number columns

you need KDocs to explain what it actually means (and past-me didn't do a good job at that either). inter-comparable means "comparable inside" (or something, I came up with it too), which narrows down the meaning to the first. Of course, other name suggestions are very welcome if you can think of any :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With suggestion of ChatGPT I'll rename it to .valuesAreComparable(). Would that be okay?

@Jolanrensen
Copy link
Collaborator Author

Could you please add a test for #352? Maybe even if something else covers it, good to have one that is closer to use case

Very good comment. After I added a test I noticed this PR didn't actually fully fix #352. It just fixed the error message. The actual issue needs to be solved separately. I will reorganize things a bit

@Jolanrensen Jolanrensen merged commit c01ff02 into master Nov 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Describe breaks on Number column
3 participants