Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add array_min function support #14417

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erenavsarogullari
Copy link
Member

Which issue does this PR close?

Closes #14416.

What changes are included in this PR?

Currently, Spark, Snowflake and Presto support array_min function. This can also be useful for DataFusion.

array_min(make_array(3,1,4,2)) => 1
array_min(make_array('h','e','l','l',NULL,'o')) => e
array_min(make_array(NULL,NULL)) => NULL

Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_min.html
Snowflake: https://docs.snowflake.com/en/sql-reference/functions/array_min
Presto: https://prestodb.io/docs/current/functions/array.html#array_min-x-x

All potential use-cases have been covered like different data_types, empty array, NULL etc.

Also, planning to add array_max function as follow-up.

Are these changes tested?

Added new UT cases to verify array_min function in terms of different source arrays.

Are there any user-facing changes?

Yes, new SQL function is supported and documentation has also be updated.

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Feb 3, 2025
@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 3 times, most recently from 5640e5d to 639f8ce Compare February 4, 2025 02:37
@erenavsarogullari erenavsarogullari changed the title feat: Add array_min function feat: Add array_min function support Feb 4, 2025
@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 2 times, most recently from c696cbb to 9beb8b7 Compare February 6, 2025 05:00
@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 7, 2025

First of all, I'm not sure whether this function should be in datafusion core or datafusion-functions-extra. It seems this is not the "core" function that is supported in both Postgres or DuckDB.

Since we are going to support Spark function, maybe we should move this function inside it #5600

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.

Review-wise, let's do array_max well in #14470
and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.

I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For

@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 7, 2025

DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.

Review-wise, let's do array_max well in #14470 and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.

I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For

DuckDB offers many array functions, but that doesn’t mean we need to port all of them to DataFusion Core. Our focus should be on functions that are already supported in PostgreSQL (which are a must-have) or those with significant user interest that justify ongoing maintenance in DataFusion Core.

@erenavsarogullari
Copy link
Member Author

Thanks @jayzhan211 and @findepi for the reviews.
Updated this PR in terms of previous feedback from array_max PR: #14470
Please also find my comment for module selection for both functions: #14470 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add array_min function support
3 participants