Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6301] Extend ‘Must-filter’ columns to support a conditional bypass list #3984

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.rel.type.RelDataTypeField;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.util.ImmutableBitSet;
import org.apache.calcite.util.Pair;
import org.apache.calcite.util.Util;

Expand Down Expand Up @@ -58,9 +57,13 @@ abstract class AbstractNamespace implements SqlValidatorNamespace {
/** As {@link #rowType}, but not necessarily a struct. */
protected @Nullable RelDataType type;

/** Ordinals of fields that must be filtered. Initially the empty set, but
* should typically be re-assigned on validate. */
protected ImmutableBitSet mustFilterFields = ImmutableBitSet.of();
/**
* Class that holds information about what fields need to be filtered, what bypass-fields
* can defuse the errors if they are filtered on as an alternative, and a set used during
* validation internally. Initialized as empty object, but should typically be re-assiged
* on validate.
*/
protected MustFilterRequirements mustFilterRequirements = new MustFilterRequirements();

protected final @Nullable SqlNode enclosingNode;

Expand Down Expand Up @@ -164,8 +167,8 @@ abstract class AbstractNamespace implements SqlValidatorNamespace {
return ImmutableList.of();
}

@Override public ImmutableBitSet getMustFilterFields() {
return requireNonNull(mustFilterFields,
@Override public MustFilterRequirements getMustFilterRequirements() {
return requireNonNull(mustFilterRequirements,
"mustFilterFields (maybe validation is not complete?)");
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ private SqlValidatorNamespace resolveImpl(SqlIdentifier id) {
}
}

this.mustFilterFields = resolvedNamespace.getMustFilterFields();
mustFilterRequirements = resolvedNamespace.getMustFilterRequirements();
RelDataType rowType = resolvedNamespace.getRowType();

if (extendList != null) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to you under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.calcite.sql.validate;

import org.apache.calcite.util.ImmutableBitSet;

import com.google.common.collect.ImmutableSet;

/**
* Class that encapsulates filtering requirements when overloading SemanticTable.
*
* <p>mustFilterFields: the ordinals (in the row type) of the "must-filter" fields,
* fields that must be filtered in a query.
* <p> mustFilterBypassFields: the ordinals (in the row type) of the "bypass" fields,
* fields that can defuse validation errors on mustFilterFields if filtered on.
* <p> remnantMustFilterFields: Set of mustFilterField SqlQualifieds that have not been defused
* in the current query, but can still be defused by filtering on a bypass field in the
* enclosing query.
*/
public class MustFilterRequirements {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must the class be public?

Class names should not be plural (except utility classes). It becomes problematic to name collections of such classes.

'Must' and 'Requirement' seem redundant. So I suggest 'FilterRequirement'.

Maybe mustFilterFieldsfilterFields, mustFilterBypassFieldsbypassFields; remnantMustFilterFieldsfilterAnyFields.

Could the fields be final? (Public final fields are fine, but if you must have non-final fields they should probably be private.)

Immutable classes are much easier to reason about. Consider adding 'withXxx' methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be package-private.
I'll rename it to FilterRequirement and filterFields, bypassFields.

Still thinking about the rename of remnantMustFilterFields ... I still think the name needs to capture the connotation that they're valid filter requirements left-over from a previous query, but also distinct from the set that can actively be defused


ImmutableBitSet mustFilterFields;
ImmutableBitSet mustFilterBypassFields;
protected ImmutableSet<SqlQualified> remnantMustFilterFields;
public MustFilterRequirements(ImmutableBitSet mustFilterFields,
ImmutableBitSet mustFilterBypassFields, ImmutableSet<SqlQualified> remnantMustFilterFields) {
olivrlee marked this conversation as resolved.
Show resolved Hide resolved
this.mustFilterFields = mustFilterFields;
this.mustFilterBypassFields = mustFilterBypassFields;
this.remnantMustFilterFields = remnantMustFilterFields;
}

public MustFilterRequirements() {
this(ImmutableBitSet.of(), ImmutableBitSet.of(), ImmutableSet.of());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,12 @@
*/
package org.apache.calcite.sql.validate;

import com.google.common.collect.ImmutableList;

import org.checkerframework.checker.nullness.qual.Nullable;

import java.util.List;

/**
* Extension to {@link SqlValidatorTable} with extra, optional metadata.
*
Expand All @@ -44,4 +48,12 @@ public interface SemanticTable {
default boolean mustFilter(int column) {
return getFilter(column) != null;
}

/**
* Returns a list of column ordinals (0-based) of fields that defuse
* must-filter columns when filtered on.
*/
default List<Integer> bypassFieldList() {
return ImmutableList.of();
}
}
217 changes: 150 additions & 67 deletions core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@
import java.util.function.Supplier;
import java.util.function.UnaryOperator;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

import static com.google.common.base.Preconditions.checkArgument;
Expand Down Expand Up @@ -1212,17 +1213,27 @@ protected void validateNamespace(final SqlValidatorNamespace namespace,
RelDataType type = namespace.getType();

if (node == top) {
// A top-level namespace must not return any must-filter fields.
MustFilterRequirements mustFilterRequirements = namespace.getMustFilterRequirements();
// Either of the following two conditions result in an invalid query:
// 1) A top-level namespace must not return any must-filter fields.
// A non-top-level namespace (e.g. a subquery) may return must-filter
// fields; these are neutralized if the consuming query filters on them.
final ImmutableBitSet mustFilterFields =
namespace.getMustFilterFields();
if (!mustFilterFields.isEmpty()) {
// 2) A top-level namespace must not have any remnant-must-filter fields.
// Remnant must filter fields are fields that are not selected and cannot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the 'remnant' concept is necessary. Instead could we just remove must-filter fields that have been satisfied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary to handle the case where the bypass field is selected, but the must-filter fields are not selected.

Imagine this setup

Table: EMP (empno, name, id)
must_filters: empno
bypass_filter: name

Query:
select * from (select name from emp) where name = 'bob'

The inner query select name from emp can't error on empno not being filtered at this point because name was selected, and could be filtered on later. The must-filter for empno is not satisfied yet at this point.

When we get to the select * from _ where name = 'bob', that's where we defuse the error for empno based on seeing name being filtered on.

If alternatively the query was
select * from (select name, id from emp) where id = '12'
When we get to select * from _ where id = '12', we're at the top-level, and remnantMustFilter's set is not empty, hence we throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After revising and simplifying the 3 fields mustFilterFields, mustFilterBypassFields and remnantMustFilterFields into a single class, I was able to simplify away the interface and no longer need to expose getRemnantFilterFields to users of SemanticTable. I'll push new commits soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If t has a must-filter field f and bypass fields b0 and b1, then the query select b0, b1 from t has no must-filter fields, no bypass fields, but two remnant fields [b0, b1].

If you filter on any of those remnant fields then they all go away.

Copy link
Contributor Author

@olivrlee olivrlee Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that is not how the sets are populated.
The purpose of remnant fields is to capture the fields that were marked as required, but no longer accessible to be filtered on and defused in the select clause. In other words, you cannot filter on any of the remnant fields. The presence of values being in the remant-filter set serves the purpose of being an indicator for erroring.

They can only be defused if any of the bypass fields are filtered on.

In your example this would be the correct set values:
select b0, b1 from t where f is a must-filter field

must-filter field: []
bypass-fields: [b0, b1]
remnant-fields: [f]

// be defused unless a bypass field defuses it.
if (!mustFilterRequirements.mustFilterFields.isEmpty()
|| !mustFilterRequirements.remnantMustFilterFields.isEmpty()) {
Stream<String> mustFilterStream =
StreamSupport.stream(mustFilterRequirements.mustFilterFields.spliterator(), false)
.map(namespace.getRowType().getFieldNames()::get);
Stream<String> remnantStream =
mustFilterRequirements.remnantMustFilterFields.stream()
.map(q -> q.suffix().get(0));

// Set of field names, sorted alphabetically for determinism.
Set<String> fieldNameSet =
StreamSupport.stream(mustFilterFields.spliterator(), false)
.map(namespace.getRowType().getFieldNames()::get)
.collect(Collectors.toCollection(TreeSet::new));
Stream.concat(mustFilterStream, remnantStream)
.collect(Collectors.toCollection(TreeSet::new));
throw newValidationError(node,
RESOURCE.mustFilterFieldsMissing(fieldNameSet.toString()));
}
Expand Down Expand Up @@ -4098,64 +4109,7 @@ protected void validateSelect(
validateSelectList(selectItems, select, targetRowType);
ns.setType(rowType);

// Deduce which columns must be filtered.
ns.mustFilterFields = ImmutableBitSet.of();
if (from != null) {
final Set<SqlQualified> qualifieds = new LinkedHashSet<>();
for (ScopeChild child : fromScope.children) {
final List<String> fieldNames =
child.namespace.getRowType().getFieldNames();
child.namespace.getMustFilterFields()
.forEachInt(i ->
qualifieds.add(
SqlQualified.create(fromScope, 1, child.namespace,
new SqlIdentifier(
ImmutableList.of(child.name, fieldNames.get(i)),
SqlParserPos.ZERO))));
}
if (!qualifieds.isEmpty()) {
if (select.getWhere() != null) {
forEachQualified(select.getWhere(), getWhereScope(select),
qualifieds::remove);
}
if (select.getHaving() != null) {
forEachQualified(select.getHaving(), getHavingScope(select),
qualifieds::remove);
}

// Each of the must-filter fields identified must be returned as a
// SELECT item, which is then flagged as must-filter.
final BitSet mustFilterFields = new BitSet();
final List<SqlNode> expandedSelectItems =
requireNonNull(fromScope.getExpandedSelectList(),
"expandedSelectList");
forEach(expandedSelectItems, (selectItem, i) -> {
selectItem = stripAs(selectItem);
if (selectItem instanceof SqlIdentifier) {
SqlQualified qualified =
fromScope.fullyQualify((SqlIdentifier) selectItem);
if (qualifieds.remove(qualified)) {
// SELECT item #i referenced a must-filter column that was not
// filtered in the WHERE or HAVING. It becomes a must-filter
// column for our consumer.
mustFilterFields.set(i);
}
}
});

// If there are must-filter fields that are not in the SELECT clause,
// this is an error.
if (!qualifieds.isEmpty()) {
throw newValidationError(select,
RESOURCE.mustFilterFieldsMissing(
qualifieds.stream()
.map(q -> q.suffix().get(0))
.collect(Collectors.toCollection(TreeSet::new))
.toString()));
}
ns.mustFilterFields = ImmutableBitSet.fromBitSet(mustFilterFields);
}
}
validateMustFilterRequirements(select, ns);

// Validate ORDER BY after we have set ns.rowType because in some
// dialects you can refer to columns of the select list, e.g.
Expand All @@ -4172,8 +4126,10 @@ protected void validateSelect(
}
}

/** For each identifier in an expression, resolves it to a qualified name
* and calls the provided action. */
/**
* For each identifier in an expression, resolves it to a qualified name
* and calls the provided action.
*/
private static void forEachQualified(SqlNode node, SqlValidatorScope scope,
Consumer<SqlQualified> consumer) {
node.accept(new SqlBasicVisitor<Void>() {
Expand All @@ -4185,6 +4141,49 @@ private static void forEachQualified(SqlNode node, SqlValidatorScope scope,
});
}

/**
* Removes all entries from qualifieds and remnantMustFilterFields if the supplied SqlNode
* is a bypassField.
*/
private static void purgeForBypassFields(SqlNode node, SqlValidatorScope scope,
Set<SqlQualified> qualifieds, Set<SqlQualified> bypassQualifieds,
Set<SqlQualified> remnantMustFilterFields) {
node.accept(new SqlBasicVisitor<Void>() {
@Override public Void visit(SqlIdentifier id) {
final SqlQualified qualified = scope.fullyQualify(id);
if (bypassQualifieds.contains(qualified)) {
// Clear all the must-filter qualifieds from the same table identifier
Collection<SqlQualified> sameIdentifier = qualifieds.stream()
.filter(q -> qualifiedMatchesIdentifier(q, qualified))
.collect(Collectors.toList());
sameIdentifier.forEach(qualifieds::remove);

// Clear all the remnant must-filter qualifieds from the same table identifier
Collection<SqlQualified> sameIdentifier_ = remnantMustFilterFields.stream()
.filter(q -> qualifiedMatchesIdentifier(q, qualified))
.collect(Collectors.toList());
sameIdentifier_.forEach(remnantMustFilterFields::remove);
}
return null;
}
});
}

private static void toQualifieds(ImmutableBitSet fields, Set<SqlQualified> qualifiedSet,
SelectScope fromScope, ScopeChild child, List<String> fieldNames) {
fields.forEachInt(i ->
qualifiedSet.add(
SqlQualified.create(fromScope, 1, child.namespace,
new SqlIdentifier(
ImmutableList.of(child.name, fieldNames.get(i)),
SqlParserPos.ZERO))));

}

private static boolean qualifiedMatchesIdentifier(SqlQualified q1, SqlQualified q2) {
olivrlee marked this conversation as resolved.
Show resolved Hide resolved
return q1.identifier.names.get(0).equals(q2.identifier.names.get(0));
}

private void checkRollUpInSelectList(SqlSelect select) {
SqlValidatorScope scope = getSelectScope(select);
for (SqlNode item : SqlNonNullableAccessors.getSelectList(select)) {
Expand Down Expand Up @@ -4638,6 +4637,90 @@ protected void validateQualifyClause(SqlSelect select) {
}
}

protected void validateMustFilterRequirements(SqlSelect select, SelectNamespace ns) {
ns.mustFilterRequirements = new MustFilterRequirements();
if (select.getFrom() != null) {
final SelectScope fromScope = (SelectScope) getFromScope(select);
final BitSet projectedNonFilteredBypassField = new BitSet();
final Set<SqlQualified> qualifieds = new LinkedHashSet<>();
final Set<SqlQualified> bypassQualifieds = new LinkedHashSet<>();
final Set<SqlQualified> remnantQualifieds = new LinkedHashSet<>();
for (ScopeChild child : fromScope.children) {
final List<String> fieldNames =
child.namespace.getRowType().getFieldNames();
MustFilterRequirements mustFilterReqs = child.namespace.getMustFilterRequirements();
toQualifieds(mustFilterReqs.mustFilterFields, qualifieds, fromScope, child,
fieldNames);
toQualifieds(mustFilterReqs.mustFilterBypassFields, bypassQualifieds, fromScope,
child, fieldNames);
remnantQualifieds.addAll(mustFilterReqs.remnantMustFilterFields);
}
if (!qualifieds.isEmpty() || !bypassQualifieds.isEmpty()) {
if (select.getWhere() != null) {
forEachQualified(select.getWhere(), getWhereScope(select),
qualifieds::remove);
purgeForBypassFields(select.getWhere(), getWhereScope(select),
qualifieds, bypassQualifieds, remnantQualifieds);
}
if (select.getHaving() != null) {
forEachQualified(select.getHaving(), getHavingScope(select),
qualifieds::remove);
purgeForBypassFields(select.getHaving(), getHavingScope(select),
qualifieds, bypassQualifieds, remnantQualifieds);
}

// Each of the must-filter fields identified must be returned as a
// SELECT item, which is then flagged as must-filter.
final BitSet mustFilterFields = new BitSet();
// Each of the bypass fields identified must be returned as a
// SELECT item, which is then flagged as a bypass field for the consumer.
final BitSet mustFilterBypassFields = new BitSet();
final List<SqlNode> expandedSelectItems =
requireNonNull(fromScope.getExpandedSelectList(),
"expandedSelectList");
forEach(expandedSelectItems, (selectItem, i) -> {
selectItem = stripAs(selectItem);
if (selectItem instanceof SqlIdentifier) {
SqlQualified qualified =
fromScope.fullyQualify((SqlIdentifier) selectItem);
if (qualifieds.remove(qualified)) {
// SELECT item #i referenced a must-filter column that was not
// filtered in the WHERE or HAVING. It becomes a must-filter
// column for our consumer.
mustFilterFields.set(i);
}
if (bypassQualifieds.remove(qualified)) {
// SELECT item #i referenced a bypass column that was not filtered in the WHERE
// or HAVING. It becomes a bypass column for our consumer.
mustFilterBypassFields.set(i);
projectedNonFilteredBypassField.set(0);
}
}
});

// If there are must-filter fields that are not in the SELECT clause and there were no
// bypass-fields on this table, this is an error.
if (!qualifieds.isEmpty() && !projectedNonFilteredBypassField.get(0)) {
throw newValidationError(select,
RESOURCE.mustFilterFieldsMissing(
qualifieds.stream()
.map(q -> q.suffix().get(0))
.collect(Collectors.toCollection(TreeSet::new))
.toString()));
}
// Remaining must-filter fields can be defused by a bypass-field,
// so we pass this to the consumer.
ImmutableSet<SqlQualified> remnantMustFilterFields =
Stream.of(remnantQualifieds, qualifieds)
.flatMap(Set::stream).collect(ImmutableSet.toImmutableSet());
olivrlee marked this conversation as resolved.
Show resolved Hide resolved
ns.mustFilterRequirements =
new MustFilterRequirements(ImmutableBitSet.fromBitSet(mustFilterFields),
ImmutableBitSet.fromBitSet(mustFilterBypassFields),
olivrlee marked this conversation as resolved.
Show resolved Hide resolved
remnantMustFilterFields);
}
}
}

@Override public void validateWith(SqlWith with, SqlValidatorScope scope) {
final SqlValidatorNamespace namespace = getNamespaceOrThrow(with);
validateNamespace(namespace, unknownType);
Expand Down
Loading
Loading