Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: support array_compact function #1321

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kazantsev-maksim
Copy link

Which issue does this PR close?

Related to Epic: #1042
array_compact: SELECT array_compact(array(1, 2, 3, null)) => array(1, 2, 3)
DataFusion' s array_compact has same behavior with Spark 's array_compact function
Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_compact.html
DataFusion: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-remove-all

Rationale for this change

Defined under Epic: #1042

What changes are included in this PR?

planner.rs: Maps Spark 's arrays_compact function to DataFusion array_remove_all_udf physical expression from Spark physical expression
expr.proto: arrays_compact message has been added,
QueryPlanSerde.scala: arrays_compact pattern matching case has been added,
CometExpressionSuite.scala: A new UT has been added for arrays_compact function.

How are these changes tested?

A new UT has been added.

Kazantsev Maksim added 2 commits January 22, 2025 10:11
# Conflicts:
#	native/core/src/execution/planner.rs
#	native/proto/src/proto/expr.proto
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
#	spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
@kazantsev-maksim kazantsev-maksim changed the title Array compact Feat: support array_compact function Jan 22, 2025
@kazantsev-maksim kazantsev-maksim deleted the array_compact branch January 23, 2025 09:33
Kazantsev Maksim added 3 commits January 24, 2025 09:35
# Conflicts:
#	native/core/src/execution/planner.rs
#	native/proto/src/proto/expr.proto
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
#	spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
@kazantsev-maksim kazantsev-maksim changed the title Feat: support array_compact function Draft: support array_compact function Jan 24, 2025
@kazantsev-maksim kazantsev-maksim changed the title Draft: support array_compact function Feat: support array_compact function Jan 25, 2025
Kazantsev Maksim added 2 commits January 25, 2025 21:16
# Conflicts:
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
.setArrayCompact(arrayCompactBuilder)
.build())
} else {
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a call to withInfo in the case where elementType is not supported so that the user sees the reason why it is not supported. There are examples of this for the other array functions.

Comment on lines +2713 to +2716
checkSparkAnswerAndOperator(
sql("SELECT array_compact(array(_2)) FROM t1 WHERE _2 IS NULL"))
checkSparkAnswerAndOperator(
sql("SELECT array_compact(array(_2)) FROM t1 WHERE _2 IS NOT NULL"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see tests covering the case where the array contains both null and non null values, but we can address that as part of #1269.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the disadvantages of makeParquetFileAllTypes is that if _2 is null, then all other columns are also null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for more coverage

@@ -2428,6 +2428,22 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim
withInfo(expr, "unsupported arguments for ArrayJoin", exprs: _*)
None
}
case expr @ ArrayFilter(child, _) if ArrayCompact(child).replacement.sql == expr.sql =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a flag to enable and disable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the PR only contains basic tests, could you add a check to enable this expression only if CometConf.COMET_CAST_ALLOW_INCOMPATIBLE is enabled? We can remove this check in a future PR that adds comprehensive tests and demonstrates that we have Spark-compatible behavior for all supported data types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants