Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kramerul
Copy link
Contributor

@kramerul kramerul commented Dec 19, 2024

Motivation

For databases with a huge set of schemas and tables it takes quite long to prepare queries. Currently all tables/schemas are loaded into memory.

Caching all these schemas and tables is not an option

  1. It will require a lot of memory
  2. The eviction of the cache must happen quite often since it's likely that every second one of these table is changed.

Therefore, we tried to find a way to load only those tables/schemas, which are required to prepare a query.

API Changes

This PR introduces a new mechanism to lookup tables and schemas within a schema. For this purpose a new interface is introduced

public interface Lookup<T> {
  @Nullable T get(String name);
  @Nullable Named<T> getIgnoreCase(String name);
  Set<String> getNames(LikePattern pattern);
}

The LikePattern was extracted from CalciteMetaImpl to hold a pattern, which can be used to query tables and schemas inside a JDBC database using the LIKE operator. Additionally, it also supports the conversion to a Predicate1<String> which can be used to implement filters in plain java.

The Schema is now using this Lookup interface to find schemas and tables. It could be also extended to functions and types.

public interface Schema {
  default Lookup<Table> tables() {
    ...
  }
  default Lookup<? extends Schema> subSchemas() {
    ...
  }
  ...
}

Implementation

The case insensitive search is now directly implemented in the specific Schema using matching implementation of the Lookup interface. Formerly, it was done in the CalciteSchema.

JdbcSchema and JdbcCatalogSchema are using a special implementation of Lookup: LoadingCacheLookup. This implementation is using a LoadingCache inside to speed up things. If only case sensitive schema/table lookup is required, this can be done quite fast since DatabaseMetaData#getTables can be used to query a single table. The result is cached inside the LoadingCache for one minute.

Unfortunately DatabaseMetaData#getTables doesn't support case insensitive queries. In this case, it's still required to load all database tables to perform case insensitive lookups.

The performance gain for huge sets of tables/schemas in database schemas can only be achieved if caching is turned off in Calcite (SimpleCalciteSchema is used instead of CachingCalciteSchema).

I tried to keep the behavior of CachingCalciteSchema exactly the same. This behavior includes that all tables/schemas are loaded into memory. CachedLookup is used to achieve this.

@kramerul kramerul marked this pull request as ready for review January 2, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant