-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: support array_compact function #1321
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # native/core/src/execution/planner.rs # native/proto/src/proto/expr.proto # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
# Conflicts: # native/core/src/execution/planner.rs # native/proto/src/proto/expr.proto # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
# Conflicts: # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
.setArrayCompact(arrayCompactBuilder) | ||
.build()) | ||
} else { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a call to withInfo
in the case where elementType
is not supported so that the user sees the reason why it is not supported. There are examples of this for the other array functions.
checkSparkAnswerAndOperator( | ||
sql("SELECT array_compact(array(_2)) FROM t1 WHERE _2 IS NULL")) | ||
checkSparkAnswerAndOperator( | ||
sql("SELECT array_compact(array(_2)) FROM t1 WHERE _2 IS NOT NULL")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see tests covering the case where the array contains both null and non null values, but we can address that as part of #1269.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the disadvantages of makeParquetFileAllTypes
is that if _2
is null, then all other columns are also null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for more coverage
@@ -2428,6 +2428,22 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim | |||
withInfo(expr, "unsupported arguments for ArrayJoin", exprs: _*) | |||
None | |||
} | |||
case expr @ ArrayFilter(child, _) if ArrayCompact(child).replacement.sql == expr.sql => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a flag to enable and disable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the PR only contains basic tests, could you add a check to enable this expression only if CometConf.COMET_CAST_ALLOW_INCOMPATIBLE
is enabled? We can remove this check in a future PR that adds comprehensive tests and demonstrates that we have Spark-compatible behavior for all supported data types.
Which issue does this PR close?
Related to Epic: #1042
array_compact: SELECT array_compact(array(1, 2, 3, null)) => array(1, 2, 3)
DataFusion' s array_compact has same behavior with Spark 's array_compact function
Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_compact.html
DataFusion: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-remove-all
Rationale for this change
Defined under Epic: #1042
What changes are included in this PR?
planner.rs: Maps Spark 's arrays_compact function to DataFusion array_remove_all_udf physical expression from Spark physical expression
expr.proto: arrays_compact message has been added,
QueryPlanSerde.scala: arrays_compact pattern matching case has been added,
CometExpressionSuite.scala: A new UT has been added for arrays_compact function.
How are these changes tested?
A new UT has been added.