Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DISTINCT + ORDER BY in ARRAY_AGG #14413

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Feb 2, 2025

Which issue does this PR close?

Closes #12371.

Rationale for this change

Completing ARRAY_AGG functionality as a prerequisite for adding the full functionality of STRING_AGG in #14412

What changes are included in this PR?

Adds a Postgres-style support for DISTINCT + ORDER_BY functionality, allowing users to issue statements like:

SELECT ARRAY_AGG(DISTINCT col ORDER BY col) FROM table;

Note that there's a limitation that prohibits ordering by an expression that is not the same as the ARRAY_AGG argument. For example, the following queries are invalid:

SELECT ARRAY_AGG(DISTINCT col ORDER BY other_col) FROM table; ❌
SELECT ARRAY_AGG(DISTINCT col ORDER BY concat(col, '')) FROM table; ❌

This is the same limitation that exists on Postgres, example in Postgres fiddle

Are these changes tested?

yes, both in unit tests and sqllogictests

Are there any user-facing changes?

Users will now be able to issue ARRAY_AGG calls mixing DISTINCT and ORDER_BY clauses

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions labels Feb 2, 2025
@gabotechs gabotechs force-pushed the array-agg-distinct-order-by branch from 9400d72 to 8eaacd6 Compare February 2, 2025 17:34
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Feb 2, 2025
@gabotechs gabotechs force-pushed the array-agg-distinct-order-by branch from 8eaacd6 to f23681c Compare February 2, 2025 17:37
@gabotechs gabotechs changed the title Add support for DISTINCT and ORDER BY in ARRAY_AGG Add support for DISTINCT + ORDER BY in ARRAY_AGG Feb 2, 2025
@@ -193,3 +193,149 @@ pub fn merge_ordered_arrays(

Ok((merged_values, merged_orderings))
}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this tests from

to this file, as they are just testing the merge_ordered_arrays function present in this file, and nothing related to ARRAY_AGG.

As I added there some unit tests that do test ARRAY_AGG, I though that it might be a good idea to move these ones out to a more suitable place.

Comment on lines -342 to +374
if values.len() != 1 {
return internal_err!("expects single batch");
if values.is_empty() {
return Ok(());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As now the distinct accumulator can accept more than 1 batch because of the ordering, removing this restriction was necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

array_agg cannot perform both distinct and order_by
1 participant