[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

kramerul · 2024-12-19T13:19:22Z

Motivation

For databases with a huge set of schemas and tables it takes quite long to prepare queries. Currently all tables/schemas are loaded into memory.

Caching all these schemas and tables is not an option

It will require a lot of memory
The eviction of the cache must happen quite often since it's likely that every second one of these table is changed.

Therefore, we tried to find a way to load only those tables/schemas, which are required to prepare a query.

API Changes

This PR introduces a new mechanism to lookup tables and schemas within a schema. For this purpose a new interface is introduced

public interface Lookup<T> {
  @Nullable T get(String name);
  @Nullable Named<T> getIgnoreCase(String name);
  Set<String> getNames(LikePattern pattern);
}

The LikePattern was extracted from CalciteMetaImpl to hold a pattern, which can be used to query tables and schemas inside a JDBC database using the LIKE operator. Additionally, it also supports the conversion to a Predicate1<String> which can be used to implement filters in plain java.

The Schema is now using this Lookup interface to find schemas and tables. It could be also extended to functions and types.

public interface Schema {
  default Lookup<Table> tables() {
    ...
  }
  default Lookup<? extends Schema> subSchemas() {
    ...
  }
  ...
}

Implementation

The case insensitive search is now directly implemented in the specific Schema using matching implementation of the Lookup interface. Formerly, it was done in the CalciteSchema.

JdbcSchema and JdbcCatalogSchema are using a special implementation of Lookup: LoadingCacheLookup. This implementation is using a LoadingCache inside to speed up things. If only case sensitive schema/table lookup is required, this can be done quite fast since DatabaseMetaData#getTables can be used to query a single table. The result is cached inside the LoadingCache for one minute.

Unfortunately DatabaseMetaData#getTables doesn't support case insensitive queries. In this case, it's still required to load all database tables to perform case insensitive lookups.

The performance gain for huge sets of tables/schemas in database schemas can only be achieved if caching is turned off in Calcite (SimpleCalciteSchema is used instead of CachingCalciteSchema).

I tried to keep the behavior of CachingCalciteSchema exactly the same. This behavior includes that all tables/schemas are loaded into memory. CachedLookup is used to achieve this.

…ide schemas

sonarqubecloud · 2024-12-19T13:49:05Z

Quality Gate passed

Issues
20 New issues
0 Accepted issues

Measures
0 Security Hotspots
82.0% Coverage on New Code
0.2% Duplication on New Code

See analysis details on SonarQube Cloud

[CALCITE-6728] Introduce new methods to lookup tables and schemas ins…

f95b084

…ide schemas

kramerul marked this pull request as ready for review January 2, 2025 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

kramerul commented Dec 19, 2024 •

edited

Loading

sonarqubecloud bot commented Dec 19, 2024

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

Are you sure you want to change the base?

[CALCITE-6728] Introduce new methods to lookup tables and schemas inside schemas #4100

Conversation

kramerul commented Dec 19, 2024 • edited Loading

Motivation

API Changes

Implementation

sonarqubecloud bot commented Dec 19, 2024

Quality Gate passed

kramerul commented Dec 19, 2024 •

edited

Loading