-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add array_min
function support
#14417
base: main
Are you sure you want to change the base?
feat: Add array_min
function support
#14417
Conversation
5640e5d
to
639f8ce
Compare
array_min
function support
c696cbb
to
9beb8b7
Compare
First of all, I'm not sure whether this function should be in datafusion core or datafusion-functions-extra. It seems this is not the "core" function that is supported in both Postgres or DuckDB. Since we are going to support Spark function, maybe we should move this function inside it #5600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.
Review-wise, let's do array_max well in #14470
and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.
I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For
DuckDB offers many array functions, but that doesn’t mean we need to port all of them to DataFusion Core. Our focus should be on functions that are already supported in PostgreSQL (which are a must-have) or those with significant user interest that justify ongoing maintenance in DataFusion Core. |
9beb8b7
to
7a1992e
Compare
b6b4d0c
to
4b027b8
Compare
4b027b8
to
8b04dc2
Compare
8b04dc2
to
8c45a60
Compare
Thanks @jayzhan211 and @findepi for the reviews. |
Which issue does this PR close?
Closes #14416.
What changes are included in this PR?
Currently, Spark, Snowflake and Presto support
array_min
function. This can also be useful for DataFusion.Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_min.html
Snowflake: https://docs.snowflake.com/en/sql-reference/functions/array_min
Presto: https://prestodb.io/docs/current/functions/array.html#array_min-x-x
All potential use-cases have been covered like different
data_types
,empty array
,NULL
etc.Also, planning to add
array_max
function as follow-up.Are these changes tested?
Added new UT cases to verify
array_min
function in terms of different source arrays.Are there any user-facing changes?
Yes, new SQL function is supported and documentation has also be updated.