Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6301] Extend ‘Must-filter’ columns to support a conditional bypass list #3984

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

olivrlee
Copy link
Contributor

No description provided.

…bypass-field being selected but not filtered on, a must-filter field not being selected (thus can no longer be defused), and filtering on the bypass field in the enclosing query, thus defusing the must-filter field

Fix style
@olivrlee olivrlee marked this pull request as ready for review October 5, 2024 00:29
Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a ton of context on these changes but they seem really thoroughly tested so approved w/ minor comments

@@ -1217,12 +1219,21 @@ protected void validateNamespace(final SqlValidatorNamespace namespace,
// fields; these are neutralized if the consuming query filters on them.
final ImmutableBitSet mustFilterFields =
namespace.getMustFilterFields();
if (!mustFilterFields.isEmpty()) {
// Remnant must filter fields are fields that are not selected and cannot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the 'remnant' concept is necessary. Instead could we just remove must-filter fields that have been satisfied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary to handle the case where the bypass field is selected, but the must-filter fields are not selected.

Imagine this setup

Table: EMP (empno, name, id)
must_filters: empno
bypass_filter: name

Query:
select * from (select name from emp) where name = 'bob'

The inner query select name from emp can't error on empno not being filtered at this point because name was selected, and could be filtered on later. The must-filter for empno is not satisfied yet at this point.

When we get to the select * from _ where name = 'bob', that's where we defuse the error for empno based on seeing name being filtered on.

If alternatively the query was
select * from (select name, id from emp) where id = '12'
When we get to select * from _ where id = '12', we're at the top-level, and remnantMustFilter's set is not empty, hence we throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After revising and simplifying the 3 fields mustFilterFields, mustFilterBypassFields and remnantMustFilterFields into a single class, I was able to simplify away the interface and no longer need to expose getRemnantFilterFields to users of SemanticTable. I'll push new commits soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If t has a must-filter field f and bypass fields b0 and b1, then the query select b0, b1 from t has no must-filter fields, no bypass fields, but two remnant fields [b0, b1].

If you filter on any of those remnant fields then they all go away.

Copy link
Contributor Author

@olivrlee olivrlee Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that is not how the sets are populated.
The purpose of remnant fields is to capture the fields that were marked as required, but no longer accessible to be filtered on and defused in the select clause. In other words, you cannot filter on any of the remnant fields. The presence of values being in the remant-filter set serves the purpose of being an indicator for erroring.

They can only be defused if any of the bypass fields are filtered on.

In your example this would be the correct set values:
select b0, b1 from t where f is a must-filter field

must-filter field: []
bypass-fields: [b0, b1]
remnant-fields: [f]

* in the current query, but can still be defused by filtering on a bypass field in the
* enclosing query.
*/
public class MustFilterRequirements {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must the class be public?

Class names should not be plural (except utility classes). It becomes problematic to name collections of such classes.

'Must' and 'Requirement' seem redundant. So I suggest 'FilterRequirement'.

Maybe mustFilterFieldsfilterFields, mustFilterBypassFieldsbypassFields; remnantMustFilterFieldsfilterAnyFields.

Could the fields be final? (Public final fields are fine, but if you must have non-final fields they should probably be private.)

Immutable classes are much easier to reason about. Consider adding 'withXxx' methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be package-private.
I'll rename it to FilterRequirement and filterFields, bypassFields.

Still thinking about the rename of remnantMustFilterFields ... I still think the name needs to capture the connotation that they're valid filter requirements left-over from a previous query, but also distinct from the set that can actively be defused

@julianhyde
Copy link
Contributor

Latest commit looks fairly close.

There needs to be an cogent explanation in the code for the concepts of bypass fields etc. The javadoc for class FilterRequirement seems like a good place. Add a couple of worked examples, like my 'table t has must-filter field f and bypass field b and the query "select ... from t" has ...'.

Move the current descriptions of the 3 fields onto the fields themselves.

'Remnant' is not a good name because it forces you to talk about how we got here - the winnowing algorithm - rather than there we are. I prefer 'anyFitlerFields' because for the query to be valid you need to filter any of them.

I think you can make FilterRequirement immutable if you have a subclass of SqlVisitor that has 3 mutable fields.

Copy link

@olivrlee olivrlee requested a review from julianhyde October 21, 2024 21:09
Copy link

github-actions bot commented Dec 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2024
@olivrlee
Copy link
Contributor Author

Still requires a review before final merge, but last commit was fairly close.

@github-actions github-actions bot removed the stale label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants