-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for percentiles #543
Comments
Would that be comparable to |
Yes, that would be ideal |
I was looking a bit into this because it sounded like something that should be easy to add, but I ran into a few peculiarities. This is the implementation for
For some reason, it doesn't handle
Also, the code for
This begs the question of what to do with a Would it make sense to be able to specify the return value as well or should people convert their I.e. something like this would be the most flexible:
|
I believe the right way to approach this is to make quantile result to be always Also, worth mentioning that for large data frames it would be beneficial to request multiple quantiles in one pass such as:
|
For what it is worth, So it looks like the API is going to be inconsistent no matter the solution. At least if we want to avoid a breaking change. |
I agree, seems that if BigDecimal support is needed it may require a special treatment but everything else should be covered by Double. Another option is to make quantiles always return BigDecimal to make it consistent but that might be overkill. |
@cmelchior I indeed also found we're missing some types in (some of) the statistics functions #558. I think returning Double's everywhere is fine as that's used by many libraries. BigDecimal/BigInteger are java-specific, so I don't mind if they get their own special treatment. That might also make the switch to multiplatform easier if we ever attempt that. |
@Jolanrensen What is your stance on breaking changes? Would it be okay to change Since |
@cmelchior Double, Int, Long, Byte, Short, Float -> median as Double Might be best to implement this in a separate PR first though, as it solves a specific part of #558 and it's indeed a breaking change, but for the better :) (hopefully, haha) |
Here is my workaround at the moment that could be helpful:
That allows me to use it as standalone and within |
It appears that API already supports
median()
andmedianFor()
but not arbitrary percentiles.To make it on par with other DataFrame APIs it would desirable to have support for
percentile(percentile = 0.95)
, etc.The text was updated successfully, but these errors were encountered: