Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Refactor representation of set ops #1538

Merged
merged 9 commits into from
Aug 21, 2024
Merged

[v1] Refactor representation of set ops #1538

merged 9 commits into from
Aug 21, 2024

Conversation

alancai98
Copy link
Member

@alancai98 alancai98 commented Aug 5, 2024

Relevant Issues

Description

Other Information

  • Updated Unreleased Section in CHANGELOG: [NO]

  • Any backward-incompatible changes? [YES]

    • Changes to AST and plan but on v1 branch
  • Any new external dependencies? [NO]

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 changed the title Refactor representation of set ops [draft] [v1] Refactor representation of set ops Aug 5, 2024
Copy link

github-actions bot commented Aug 5, 2024

CROSS-ENGINE Conformance Report ❌

BASE (LEGACY-FAB43BF) TARGET (EVAL-FAB43BF) +/-
% Passing 90.45% 96.62% 6.17% ✅
Passing 5333 5697 364 ✅
Failing 563 199 -364 ✅
Ignored 0 0 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: fab43bf
  • Base Engine: LEGACY
  • Target Commit: fab43bf
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing Tests. ❌
  • Passing in both: 5205
  • Failing in both: 71
  • PASSING in BASE but now FAILING in TARGET: 128
  • FAILING in BASE but now PASSING in TARGET: 492

Now Failing Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

492 test(s) were previously failing in BASE (LEGACY-FAB43BF) but now pass in TARGET (EVAL-FAB43BF). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-LEGACY Conformance Report ❌

BASE (LEGACY-7AEB1BE) TARGET (LEGACY-FAB43BF) +/-
% Passing 90.72% 90.45% -0.27% ⭕
Passing 5329 5333 4 ✅
Failing 545 563 18 ⭕
Ignored 0 0 0 ✅
Total Tests 5874 5896 22 ✅

Testing Details

  • Base Commit: 7aeb1be
  • Base Engine: LEGACY
  • Target Commit: fab43bf
  • Target Engine: LEGACY

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing Tests. ❌
  • Passing in both: 5324
  • Failing in both: 539
  • PASSING in BASE but now FAILING in TARGET: 5
  • FAILING in BASE but now PASSING in TARGET: 0

Now Failing Tests ❌

The following 5 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. outerUnionCoerceScalar, compileOption: LEGACY
  2. outerUnionCoerceStruct, compileOption: LEGACY
  3. outerUnionCoerceNullMissing, compileOption: PERMISSIVE
  4. outerUnionCoerceNullMissing, compileOption: LEGACY
  5. Example 6 — Value Coercion; Coercion of single value, compileOption: LEGACY

CROSS-COMMIT-EVAL Conformance Report ✅

BASE (EVAL-7AEB1BE) TARGET (EVAL-FAB43BF) +/-
% Passing 96.41% 96.62% 0.22% ✅
Passing 5664 5697 33 ✅
Failing 211 199 -12 ✅
Ignored 0 0 0 ✅
Total Tests 5875 5896 21 ✅

Testing Details

  • Base Commit: 7aeb1be
  • Base Engine: EVAL
  • Target Commit: fab43bf
  • Target Engine: EVAL

Result Details

  • Passing in both: 5664
  • Failing in both: 199
  • PASSING in BASE but now FAILING in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 6

Now Passing Tests

The following 6 test(s) were previously FAILING in BASE but are now PASSING in TARGET. Before merging, confirm they are intended to pass:

Click here to see
  1. Example 6 — Value Coercion, compileOption: LEGACY
  2. outerUnionCoerceScalar, compileOption: LEGACY
  3. outerUnionCoerceStruct, compileOption: LEGACY
  4. outerUnionCoerceNullMissing, compileOption: PERMISSIVE
  5. outerUnionCoerceNullMissing, compileOption: LEGACY
  6. Example 6 — Value Coercion; Coercion of single value, compileOption: LEGACY

type: '.set_op',
operand: '.expr.s_f_w',
},
query_set::{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) primary addition of this PR is adding query_set and query_expr to the AST. This modeling allows us to define ORDER BY, LIMIT, and OFFSET on SFW queries as well as SQL set ops.

Modeling is partially based off

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Would we want to add the naming followed by PLR and SQLParser? AKA Query is the top-level product type holding the expression, order-by, etc. And the union types QuerySet (PLR) or SetExpr (SQLParser) holding the different variants (expr, select, wrapped Query, values, etc).

Especially since the Java name will now be: Expr.QuerySet and QueryExpr. I found it a bit confusing in the rest of the code. Could just be: Expr.Query and Expr.Query.Body.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more context on PLR's AST definition in the partiql-tests PR partiql/partiql-tests#121 (comment). The AST definition I used in this PR allows for us to be more strict in what expressions we want to support with LIMIT, OFFSET, and ORDER BY. For now, I'm thinking we only permit those clauses on

  • SFW
  • SQL set ops
  • PartiQL bag ops

PLR also allows those clauses on general expressions. Since it's not in the PartiQL spec, I haven't chosen to allow those clauses on general expressions.

I agree that the previous naming was confusing, so updated to use query_body rather than query_expr.

@@ -320,7 +320,14 @@ class QueryPrettyPrinter {
is PartiqlAst.Expr.Or -> writeNAryOperator("OR", node.operands, sb, level)
is PartiqlAst.Expr.InCollection -> writeNAryOperator("IN", node.operands, sb, level)
is PartiqlAst.Expr.BagOp -> {
var name = node.op.javaClass.simpleName.toUpperCase().replace("_", " ")
var name = when (node.op) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) changes to partiql-lang tests follow what I did for a similar PR targeting v0.14.6 to fix some legacy tests -- #1506.

@@ -322,7 +342,7 @@ rel::{
_: [
call::{
agg: ref,
set_quantifier: [ ALL, DISTINCT ],
setq: set_quantifier,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) moved set_quantifier out of this nested aggregation plan node. Also renamed to setq for consistency w/ the set ops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) currently points to the source branch of partiql/partiql-tests#121. Once that partiql-tests PR is merged in, I will update the submodule commit.

@OptIn(PartiQLValueExperimental::class)
internal fun PartiQLValue.toBag(): BagValue<*> {
TODO("For OUTER set operators")
internal fun Datum.asIterator(coerce: Boolean): Iterator<Datum> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) coercion behavior is slightly different than other type check exceptions in permissive mode. For the bag operators, we coerce to a bag rather than return missing.

@alancai98 alancai98 changed the title [draft] [v1] Refactor representation of set ops [v1] Refactor representation of set ops Aug 9, 2024
@alancai98 alancai98 marked this pull request as ready for review August 9, 2024 22:31
@alancai98 alancai98 self-assigned this Aug 9, 2024
@alancai98 alancai98 requested a review from johnedquinn August 9, 2024 22:32
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to go through the evaluation implementations for the bag ops -- but overall looks good. I've left some comments on the modelling.

Comment on lines +545 to +547
order=orderByClause??
limit=limitClause??
offset=offsetByClause?? # SfwQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think. Can we remove these completely? Since the AST nodes don't allow for this, this may result in ambiguous results. I believe SQL would only allow this by using parenthesis -- which I believe would be handled even if removing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the AST nodes don't allow for this, this may result in ambiguous results.

Could you clarify on this? Are you referring to if a set op query is defined without parens and it includes a LIMIT, OFFSET, or ORDER BY? For example,

SELECT * FROM a UNION SELECT * FROM b LIMIT 5

In this case, I believe the parsing behavior we want is to apply the LIMIT to the UNION set operator. This behavior follows what PostgreSQL's parsing does - https://www.postgresql.org/docs/current/queries-union.html.

type: '.set_op',
operand: '.expr.s_f_w',
},
query_set::{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Would we want to add the naming followed by PLR and SQLParser? AKA Query is the top-level product type holding the expression, order-by, etc. And the union types QuerySet (PLR) or SetExpr (SQLParser) holding the different variants (expr, select, wrapped Query, values, etc).

Especially since the Java name will now be: Expr.QuerySet and QueryExpr. I found it a bit confusing in the rest of the code. Could just be: Expr.Query and Expr.Query.Body.

@alancai98
Copy link
Member Author

Thanks for the review @johnedquinn. Based on the offline discussion, we decided on the following changes

  • allow LIMIT, OFFSET, and ORDER BY on the outer bag ops
  • coercion function for NULL and MISSING will the same as the spec's FROM source coercion (basically wrap the absent type in a bag rather than empty bag).

Will mark this PR as a draft till I address those concerns and the other PR comments.

@alancai98 alancai98 marked this pull request as draft August 14, 2024 00:55
@alancai98 alancai98 force-pushed the v1-fix-set-ops-ast branch 2 times, most recently from 359f674 to 22ca247 Compare August 16, 2024 21:38
val quantifier = when (setOp.type.setq) {
SetQuantifier.ALL -> Rel.Op.Set.Quantifier.ALL
null, SetQuantifier.DISTINCT -> Rel.Op.Set.Quantifier.DISTINCT
private fun convertSetOp(setExpr: QueryBody.SetOp): Rel {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) one benefit of reusing the SQL set op plan nodes for the PartiQL bag ops is that we can reuse the Rel scan behavior for non-collections (i.e. coercing to a bag in permissive and erroring in strict mode).

@alancai98 alancai98 marked this pull request as ready for review August 16, 2024 21:47
@alancai98 alancai98 requested a review from johnedquinn August 16, 2024 21:47
@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 76.00000% with 6 lines in your changes missing coverage. Please review.

Please upload report for BASE (v1@db9b6bd). Learn more about missing BASE report.

Files Patch % Lines
...org/partiql/lang/prettyprint/QueryPrettyPrinter.kt 42.85% 3 Missing and 1 partial ⚠️
.../org/partiql/lang/syntax/impl/PartiQLPigVisitor.kt 88.88% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##             v1    #1538   +/-   ##
=====================================
  Coverage      ?   77.15%           
  Complexity    ?     2514           
=====================================
  Files         ?      254           
  Lines         ?    18583           
  Branches      ?     3523           
=====================================
  Hits          ?    14337           
  Misses        ?     3212           
  Partials      ?     1034           
Flag Coverage Δ
EXAMPLES 80.07% <ø> (?)
LANG 77.06% <76.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Once the tests PR is merged and the submodule commit is updated, I can approve.

@alancai98
Copy link
Member Author

Since the last review,

Comment on lines 10 to 17
fun Record.toDatumArrayCoerceMissing(): Array<Datum> = Array(this.values.size) {
val d = [email protected][it]
when (d.isMissing) {
true -> Datum.nullValue()
else -> d
}
}
Copy link
Member

@johnedquinn johnedquinn Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just clarifying why you're doing this: even though the comparator treats null/missing as equal, it isn't deterministic which value will be returned in the TreeSet -- so you need to coerce them ahead of time.

Also, I'm leaving a suggestion here because I don't think you need to create a new Array and move all previous pointers to the new one.

Suggested change
fun Record.toDatumArrayCoerceMissing(): Array<Datum> = Array(this.values.size) {
val d = this@toDatumArrayCoerceMissing.values[it]
when (d.isMissing) {
true -> Datum.nullValue()
else -> d
}
}
fun Array<Datum>.coerceMissing() {
for (i in 0 until this.size) {
if (this[i].isMissing) {
this[i] = Datum.nullValue()
}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just clarifying why you're doing this: even though the comparator treats null/missing as equal, it isn't deterministic which value will be returned in the TreeSet -- so you need to coerce them ahead of time.

Yes, that's correct. I added a brief comment in the latest commit to give some more context on the behavior.


Also, I'm leaving a suggestion here because I don't think you need to create a new Array and move all previous pointers to the new one.

Thanks for the suggestion. You're right -- it can be simplified quite a bit. I applied your suggestion in the latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants