-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iceberg catalog read support #98
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I've added a few comments
CMakeLists.txt
Outdated
@@ -4,7 +4,7 @@ cmake_minimum_required(VERSION 2.8.12) | |||
set(TARGET_NAME iceberg) | |||
project(${TARGET_NAME}) | |||
|
|||
set(CMAKE_CXX_STANDARD 14) | |||
set(CMAKE_CXX_STANDARD 17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to bump this? I would prefer not to, to avoid any CI issues
src/catalog_api.cpp
Outdated
} | ||
return result; | ||
|
||
// throw std::runtime_error("No AWS credentials found for table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be removed
src/storage/ic_transaction.cpp
Outdated
} | ||
} | ||
|
||
// ICConnection &ICTransaction::GetConnection() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's just remove all commented out code. This has been copied from the uc_catalog extension which has a prototype-quality level codebase
require httpfs | ||
|
||
statement ok | ||
CREATE SECRET ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for a first PR, but we should think about the UX a little here: right now there is only 1 iceberg secret which is then fetched when calling ATTACH '' AS my_datalake (TYPE ICEBERG);
However it's probably desirable to be able to do something like:
CREATE SECRET irc_secret_1 (TYPE ICEBERG, ENDPOINT 'http://127.0.0.1:8181', BEARER_TOKEN 'bla');
CREATE SECRET irc_secret_2 (TYPE ICEBERG, ENDPOINT 'http://some.other.thing.com', BEARER_TOKEN 'bla');
ATTACH 'irc_secret_1' AS irc1 (TYPE ICEBERG);
ATTACH 'irc_secret_2' AS irc2 (TYPE ICEBERG);
or perhaps
CREATE SECRET irc_secret_1 (TYPE ICEBERG, SCOPE 'http://127.0.0.1:8181', BEARER_TOKEN 'bla');
CREATE SECRET irc_secret_2 (TYPE ICEBERG, SCOPE 'http://some.other.thing.com', BEARER_TOKEN 'bla');
ATTACH 'http://127.0.0.1:8181' AS irc1 (TYPE ICEBERG);
ATTACH 'http://some.other.thing.com' AS irc2 (TYPE ICEBERG);
We can have some discussion on what is nicest, these are currently also open questions for other catalog extensions such as postgres and mysql i think
} | ||
|
||
auto &ic_catalog = catalog.Cast<ICCatalog>(); | ||
// TODO: handle out-of-order columns using position property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO is copy pasted from uc_catalog: we should figure out if this is also relevant to iceberg, if so it can stay otherwise we should remove it to avoid confusion
auto &get = (LogicalGet &)*op; | ||
bind_data = std::move(get.bind_data); | ||
|
||
return parquet_scan_function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't work with tables with deletes and potentially return incorrect results. Can we add a test for this?
If it does indeed fail, we should take a look into how to solve this, or at least throw an error
src/storage/ic_table_entry.cpp
Outdated
auto table_ref = iceberg_scan_function.bind_replace(context, bind_input); | ||
|
||
// 1) Create a Binder and bind the parser-level TableRef -> BoundTableRef | ||
auto binder = Binder::CreateBinder(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting
src/storage/ic_table_entry.cpp
Outdated
throw NotImplementedException("BindUpdateConstraints"); | ||
} | ||
|
||
struct MyIcebergFunctionData : public FunctionData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is dead code i think?
src/common/utils.cpp | ||
src/common/schema.cpp | ||
src/common/iceberg.cpp | ||
src/iceberg_functions/iceberg_snapshots.cpp | ||
src/iceberg_functions/iceberg_scan.cpp | ||
src/iceberg_functions/iceberg_metadata.cpp) | ||
src/iceberg_functions/iceberg_metadata.cpp | ||
src/storage/ic_catalog.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this structure could be a bit clearer
What do you think about renaming the IC to IRC (this code is meant for the Iceberg Rest Catalog after all) and moving everything:
src/storage/irc/irc_catalog.cpp
?
This would allow us to later on create another catalog which can handle static iceberg tables, similar to duckdb/duckdb-delta#110 which could then live in src/storage/static/static_catalog.cpp
Piggy backing off of #95
This PR adds support for attaching an Iceberg catalog and be able to read iceberg tables from a datalake. It specifically supports the following sql commands:
Attach your catalog
Some tests as well to make sure everything works. There is also a double check on the config since it's possible some catalogs won't return configs. In this case the user is required to have a second key.