-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6301] Extend ‘Must-filter’ columns to support a conditional bypass list #3984
base: main
Are you sure you want to change the base?
Changes from 7 commits
2841b68
2084793
028c6ce
0dd3461
1a90d10
3fe4a5c
b0ec462
d1e6027
90b0f5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to you under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.apache.calcite.sql.validate; | ||
|
||
import org.apache.calcite.util.ImmutableBitSet; | ||
|
||
import com.google.common.collect.ImmutableSet; | ||
|
||
/** | ||
* Class that encapsulates filtering requirements when overloading SemanticTable. | ||
* | ||
* <p>mustFilterFields: the ordinals (in the row type) of the "must-filter" fields, | ||
* fields that must be filtered in a query. | ||
* <p> mustFilterBypassFields: the ordinals (in the row type) of the "bypass" fields, | ||
* fields that can defuse validation errors on mustFilterFields if filtered on. | ||
* <p> remnantMustFilterFields: Set of mustFilterField SqlQualifieds that have not been defused | ||
* in the current query, but can still be defused by filtering on a bypass field in the | ||
* enclosing query. | ||
*/ | ||
public class MustFilterRequirements { | ||
|
||
ImmutableBitSet mustFilterFields; | ||
ImmutableBitSet mustFilterBypassFields; | ||
protected ImmutableSet<SqlQualified> remnantMustFilterFields; | ||
public MustFilterRequirements(ImmutableBitSet mustFilterFields, | ||
ImmutableBitSet mustFilterBypassFields, ImmutableSet<SqlQualified> remnantMustFilterFields) { | ||
olivrlee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
this.mustFilterFields = mustFilterFields; | ||
this.mustFilterBypassFields = mustFilterBypassFields; | ||
this.remnantMustFilterFields = remnantMustFilterFields; | ||
} | ||
|
||
public MustFilterRequirements() { | ||
this(ImmutableBitSet.of(), ImmutableBitSet.of(), ImmutableSet.of()); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,6 +160,7 @@ | |
import java.util.function.Supplier; | ||
import java.util.function.UnaryOperator; | ||
import java.util.stream.Collectors; | ||
import java.util.stream.Stream; | ||
import java.util.stream.StreamSupport; | ||
|
||
import static com.google.common.base.Preconditions.checkArgument; | ||
|
@@ -1212,17 +1213,27 @@ protected void validateNamespace(final SqlValidatorNamespace namespace, | |
RelDataType type = namespace.getType(); | ||
|
||
if (node == top) { | ||
// A top-level namespace must not return any must-filter fields. | ||
MustFilterRequirements mustFilterRequirements = namespace.getMustFilterRequirements(); | ||
// Either of the following two conditions result in an invalid query: | ||
// 1) A top-level namespace must not return any must-filter fields. | ||
// A non-top-level namespace (e.g. a subquery) may return must-filter | ||
// fields; these are neutralized if the consuming query filters on them. | ||
final ImmutableBitSet mustFilterFields = | ||
namespace.getMustFilterFields(); | ||
if (!mustFilterFields.isEmpty()) { | ||
// 2) A top-level namespace must not have any remnant-must-filter fields. | ||
// Remnant must filter fields are fields that are not selected and cannot | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure whether the 'remnant' concept is necessary. Instead could we just remove must-filter fields that have been satisfied? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is necessary to handle the case where the bypass field is selected, but the must-filter fields are not selected. Imagine this setup
Query: The inner query When we get to the If alternatively the query was There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After revising and simplifying the 3 fields There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. If If you filter on any of those remnant fields then they all go away. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually that is not how the sets are populated. They can only be defused if any of the bypass fields are filtered on. In your example this would be the correct set values: must-filter field: [] |
||
// be defused unless a bypass field defuses it. | ||
if (!mustFilterRequirements.mustFilterFields.isEmpty() | ||
|| !mustFilterRequirements.remnantMustFilterFields.isEmpty()) { | ||
Stream<String> mustFilterStream = | ||
StreamSupport.stream(mustFilterRequirements.mustFilterFields.spliterator(), false) | ||
.map(namespace.getRowType().getFieldNames()::get); | ||
Stream<String> remnantStream = | ||
mustFilterRequirements.remnantMustFilterFields.stream() | ||
.map(q -> q.suffix().get(0)); | ||
|
||
// Set of field names, sorted alphabetically for determinism. | ||
Set<String> fieldNameSet = | ||
StreamSupport.stream(mustFilterFields.spliterator(), false) | ||
.map(namespace.getRowType().getFieldNames()::get) | ||
.collect(Collectors.toCollection(TreeSet::new)); | ||
Stream.concat(mustFilterStream, remnantStream) | ||
.collect(Collectors.toCollection(TreeSet::new)); | ||
throw newValidationError(node, | ||
RESOURCE.mustFilterFieldsMissing(fieldNameSet.toString())); | ||
} | ||
|
@@ -4098,64 +4109,7 @@ protected void validateSelect( | |
validateSelectList(selectItems, select, targetRowType); | ||
ns.setType(rowType); | ||
|
||
// Deduce which columns must be filtered. | ||
ns.mustFilterFields = ImmutableBitSet.of(); | ||
if (from != null) { | ||
final Set<SqlQualified> qualifieds = new LinkedHashSet<>(); | ||
for (ScopeChild child : fromScope.children) { | ||
final List<String> fieldNames = | ||
child.namespace.getRowType().getFieldNames(); | ||
child.namespace.getMustFilterFields() | ||
.forEachInt(i -> | ||
qualifieds.add( | ||
SqlQualified.create(fromScope, 1, child.namespace, | ||
new SqlIdentifier( | ||
ImmutableList.of(child.name, fieldNames.get(i)), | ||
SqlParserPos.ZERO)))); | ||
} | ||
if (!qualifieds.isEmpty()) { | ||
if (select.getWhere() != null) { | ||
forEachQualified(select.getWhere(), getWhereScope(select), | ||
qualifieds::remove); | ||
} | ||
if (select.getHaving() != null) { | ||
forEachQualified(select.getHaving(), getHavingScope(select), | ||
qualifieds::remove); | ||
} | ||
|
||
// Each of the must-filter fields identified must be returned as a | ||
// SELECT item, which is then flagged as must-filter. | ||
final BitSet mustFilterFields = new BitSet(); | ||
final List<SqlNode> expandedSelectItems = | ||
requireNonNull(fromScope.getExpandedSelectList(), | ||
"expandedSelectList"); | ||
forEach(expandedSelectItems, (selectItem, i) -> { | ||
selectItem = stripAs(selectItem); | ||
if (selectItem instanceof SqlIdentifier) { | ||
SqlQualified qualified = | ||
fromScope.fullyQualify((SqlIdentifier) selectItem); | ||
if (qualifieds.remove(qualified)) { | ||
// SELECT item #i referenced a must-filter column that was not | ||
// filtered in the WHERE or HAVING. It becomes a must-filter | ||
// column for our consumer. | ||
mustFilterFields.set(i); | ||
} | ||
} | ||
}); | ||
|
||
// If there are must-filter fields that are not in the SELECT clause, | ||
// this is an error. | ||
if (!qualifieds.isEmpty()) { | ||
throw newValidationError(select, | ||
RESOURCE.mustFilterFieldsMissing( | ||
qualifieds.stream() | ||
.map(q -> q.suffix().get(0)) | ||
.collect(Collectors.toCollection(TreeSet::new)) | ||
.toString())); | ||
} | ||
ns.mustFilterFields = ImmutableBitSet.fromBitSet(mustFilterFields); | ||
} | ||
} | ||
validateMustFilterRequirements(select, ns); | ||
|
||
// Validate ORDER BY after we have set ns.rowType because in some | ||
// dialects you can refer to columns of the select list, e.g. | ||
|
@@ -4172,8 +4126,10 @@ protected void validateSelect( | |
} | ||
} | ||
|
||
/** For each identifier in an expression, resolves it to a qualified name | ||
* and calls the provided action. */ | ||
/** | ||
* For each identifier in an expression, resolves it to a qualified name | ||
* and calls the provided action. | ||
*/ | ||
private static void forEachQualified(SqlNode node, SqlValidatorScope scope, | ||
Consumer<SqlQualified> consumer) { | ||
node.accept(new SqlBasicVisitor<Void>() { | ||
|
@@ -4185,6 +4141,49 @@ private static void forEachQualified(SqlNode node, SqlValidatorScope scope, | |
}); | ||
} | ||
|
||
/** | ||
* Removes all entries from qualifieds and remnantMustFilterFields if the supplied SqlNode | ||
* is a bypassField. | ||
*/ | ||
private static void purgeForBypassFields(SqlNode node, SqlValidatorScope scope, | ||
Set<SqlQualified> qualifieds, Set<SqlQualified> bypassQualifieds, | ||
Set<SqlQualified> remnantMustFilterFields) { | ||
node.accept(new SqlBasicVisitor<Void>() { | ||
@Override public Void visit(SqlIdentifier id) { | ||
final SqlQualified qualified = scope.fullyQualify(id); | ||
if (bypassQualifieds.contains(qualified)) { | ||
// Clear all the must-filter qualifieds from the same table identifier | ||
Collection<SqlQualified> sameIdentifier = qualifieds.stream() | ||
.filter(q -> qualifiedMatchesIdentifier(q, qualified)) | ||
.collect(Collectors.toList()); | ||
sameIdentifier.forEach(qualifieds::remove); | ||
|
||
// Clear all the remnant must-filter qualifieds from the same table identifier | ||
Collection<SqlQualified> sameIdentifier_ = remnantMustFilterFields.stream() | ||
.filter(q -> qualifiedMatchesIdentifier(q, qualified)) | ||
.collect(Collectors.toList()); | ||
sameIdentifier_.forEach(remnantMustFilterFields::remove); | ||
} | ||
return null; | ||
} | ||
}); | ||
} | ||
|
||
private static void toQualifieds(ImmutableBitSet fields, Set<SqlQualified> qualifiedSet, | ||
SelectScope fromScope, ScopeChild child, List<String> fieldNames) { | ||
fields.forEachInt(i -> | ||
qualifiedSet.add( | ||
SqlQualified.create(fromScope, 1, child.namespace, | ||
new SqlIdentifier( | ||
ImmutableList.of(child.name, fieldNames.get(i)), | ||
SqlParserPos.ZERO)))); | ||
|
||
} | ||
|
||
private static boolean qualifiedMatchesIdentifier(SqlQualified q1, SqlQualified q2) { | ||
olivrlee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return q1.identifier.names.get(0).equals(q2.identifier.names.get(0)); | ||
} | ||
|
||
private void checkRollUpInSelectList(SqlSelect select) { | ||
SqlValidatorScope scope = getSelectScope(select); | ||
for (SqlNode item : SqlNonNullableAccessors.getSelectList(select)) { | ||
|
@@ -4638,6 +4637,90 @@ protected void validateQualifyClause(SqlSelect select) { | |
} | ||
} | ||
|
||
protected void validateMustFilterRequirements(SqlSelect select, SelectNamespace ns) { | ||
ns.mustFilterRequirements = new MustFilterRequirements(); | ||
if (select.getFrom() != null) { | ||
final SelectScope fromScope = (SelectScope) getFromScope(select); | ||
final BitSet projectedNonFilteredBypassField = new BitSet(); | ||
final Set<SqlQualified> qualifieds = new LinkedHashSet<>(); | ||
final Set<SqlQualified> bypassQualifieds = new LinkedHashSet<>(); | ||
final Set<SqlQualified> remnantQualifieds = new LinkedHashSet<>(); | ||
for (ScopeChild child : fromScope.children) { | ||
final List<String> fieldNames = | ||
child.namespace.getRowType().getFieldNames(); | ||
MustFilterRequirements mustFilterReqs = child.namespace.getMustFilterRequirements(); | ||
toQualifieds(mustFilterReqs.mustFilterFields, qualifieds, fromScope, child, | ||
fieldNames); | ||
toQualifieds(mustFilterReqs.mustFilterBypassFields, bypassQualifieds, fromScope, | ||
child, fieldNames); | ||
remnantQualifieds.addAll(mustFilterReqs.remnantMustFilterFields); | ||
} | ||
if (!qualifieds.isEmpty() || !bypassQualifieds.isEmpty()) { | ||
if (select.getWhere() != null) { | ||
forEachQualified(select.getWhere(), getWhereScope(select), | ||
qualifieds::remove); | ||
purgeForBypassFields(select.getWhere(), getWhereScope(select), | ||
qualifieds, bypassQualifieds, remnantQualifieds); | ||
} | ||
if (select.getHaving() != null) { | ||
forEachQualified(select.getHaving(), getHavingScope(select), | ||
qualifieds::remove); | ||
purgeForBypassFields(select.getHaving(), getHavingScope(select), | ||
qualifieds, bypassQualifieds, remnantQualifieds); | ||
} | ||
|
||
// Each of the must-filter fields identified must be returned as a | ||
// SELECT item, which is then flagged as must-filter. | ||
final BitSet mustFilterFields = new BitSet(); | ||
// Each of the bypass fields identified must be returned as a | ||
// SELECT item, which is then flagged as a bypass field for the consumer. | ||
final BitSet mustFilterBypassFields = new BitSet(); | ||
final List<SqlNode> expandedSelectItems = | ||
requireNonNull(fromScope.getExpandedSelectList(), | ||
"expandedSelectList"); | ||
forEach(expandedSelectItems, (selectItem, i) -> { | ||
selectItem = stripAs(selectItem); | ||
if (selectItem instanceof SqlIdentifier) { | ||
SqlQualified qualified = | ||
fromScope.fullyQualify((SqlIdentifier) selectItem); | ||
if (qualifieds.remove(qualified)) { | ||
// SELECT item #i referenced a must-filter column that was not | ||
// filtered in the WHERE or HAVING. It becomes a must-filter | ||
// column for our consumer. | ||
mustFilterFields.set(i); | ||
} | ||
if (bypassQualifieds.remove(qualified)) { | ||
// SELECT item #i referenced a bypass column that was not filtered in the WHERE | ||
// or HAVING. It becomes a bypass column for our consumer. | ||
mustFilterBypassFields.set(i); | ||
projectedNonFilteredBypassField.set(0); | ||
} | ||
} | ||
}); | ||
|
||
// If there are must-filter fields that are not in the SELECT clause and there were no | ||
// bypass-fields on this table, this is an error. | ||
if (!qualifieds.isEmpty() && !projectedNonFilteredBypassField.get(0)) { | ||
throw newValidationError(select, | ||
RESOURCE.mustFilterFieldsMissing( | ||
qualifieds.stream() | ||
.map(q -> q.suffix().get(0)) | ||
.collect(Collectors.toCollection(TreeSet::new)) | ||
.toString())); | ||
} | ||
// Remaining must-filter fields can be defused by a bypass-field, | ||
// so we pass this to the consumer. | ||
ImmutableSet<SqlQualified> remnantMustFilterFields = | ||
Stream.of(remnantQualifieds, qualifieds) | ||
.flatMap(Set::stream).collect(ImmutableSet.toImmutableSet()); | ||
olivrlee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ns.mustFilterRequirements = | ||
new MustFilterRequirements(ImmutableBitSet.fromBitSet(mustFilterFields), | ||
ImmutableBitSet.fromBitSet(mustFilterBypassFields), | ||
olivrlee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
remnantMustFilterFields); | ||
} | ||
} | ||
} | ||
|
||
@Override public void validateWith(SqlWith with, SqlValidatorScope scope) { | ||
final SqlValidatorNamespace namespace = getNamespaceOrThrow(with); | ||
validateNamespace(namespace, unknownType); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must the class be
public
?Class names should not be plural (except utility classes). It becomes problematic to name collections of such classes.
'Must' and 'Requirement' seem redundant. So I suggest 'FilterRequirement'.
Maybe
mustFilterFields
→filterFields
,mustFilterBypassFields
→bypassFields
;remnantMustFilterFields
→filterAnyFields
.Could the fields be
final
? (Public final fields are fine, but if you must have non-final fields they should probably beprivate
.)Immutable classes are much easier to reason about. Consider adding 'withXxx' methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be package-private.
I'll rename it to
FilterRequirement
andfilterFields
,bypassFields
.Still thinking about the rename of
remnantMustFilterFields
... I still think the name needs to capture the connotation that they're valid filter requirements left-over from a previous query, but also distinct from the set that can actively be defused