-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Implementation of batch ddl in dbapi #1092
base: main
Are you sure you want to change the base?
Conversation
@@ -61,7 +106,7 @@ def execute_statement(self, parsed_statement: ParsedStatement): | |||
raise ProgrammingError("Only DML statements are allowed in batch DML mode.") | |||
self._statements.append(parsed_statement.statement) | |||
|
|||
def run_batch_dml(self): | |||
def run_batch(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a breaking change?
We can probably get away with it now, but this is something that we need to be more careful with, and we should try to mark whatever is not for public consumption as such as much as possible.
Either by adding a _
at the start of the method name, or by explicitly adding documentation that the method is not for public use, and can change at any moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Added the following documentation on the method This method is internal and not for public use as it can change anytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of renaming the method, we could mark it as deprecated and create a new one with the _
prefix which indicates that it is internal. Please avoid a major version bump if at all possible as it is disruptive for downstream users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the reason because of which we are proposing a major version bump. Please check https://docs.google.com/document/d/1mSVC_AhPpdSA0xUoj4LUt2o2tVM_LDv6IdssiwWnbmw for details
@@ -126,6 +134,15 @@ def autocommit(self): | |||
""" | |||
return self._autocommit | |||
|
|||
@property | |||
def buffer_ddl_statements(self): | |||
"""Whether to buffer ddl statements for this connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more documentation. I (think I) understand what it means. To someone who is not familiar with Cloud Spanner, it is not clear what this means. It is also not clear when they should enable/disable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -91,7 +95,9 @@ class Connection: | |||
should end a that a new one should be started when the next statement is executed. | |||
""" | |||
|
|||
def __init__(self, instance, database=None, read_only=False): | |||
def __init__( | |||
self, instance, database=None, read_only=False, buffer_ddl_statements=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer_ddl_statements=False
means that we are changing the default behavior in a significant way. That requires a major version bump, and should be clearly called out in the release notes. The description of the PR will automatically be included in the release notes, so please add information about this change there.
Also, in order to force a major version bump, you can add a !
to the title like this: feat!: Support explicit DDL batching
(Also note that the title of the PR will be included in the release notes, and as such is intended for public consumption. We should therefore try to keep that title as descriptive as possible for external users so they understand what the change is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added description and changed title to force a major version bump
self._batch_ddl_executor = BatchDdlExecutor(self) | ||
|
||
@check_not_closed | ||
def execute_batch_ddl_statement(self, parsed_statement: ParsedStatement): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this method to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment that its an internal method to this and other methods as well
tests/system/test_dbapi.py
Outdated
if auto_commit: | ||
self._conn.autocommit = True | ||
|
||
self._cursor.execute("start batch ddl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense if this statement already causes an error when auto_commit=False
, instead of letting the error only happen when run batch
is executed. Or is there some scenario where this could become a valid DDL batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if during start batch ddl
auto_commit was True but at run batch
it was False. The reverse is not possible as if it was False during start batch ddl
then the connection is in transaction and one cant change the auto commit mode when a transaction is in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that my main takeaway from that is: It should not be possible to change the value of auto_commit
while a batch is in progress. That should be the case for both for DDL batches and DML batches.
Take for example DML batches, and assume the following script:
set auto_commit=True;
start batch dml;
insert into foo (..);
set auto_commit=False;
insert into bar (...);
run batch;
rollback;
What should be the result of the above script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change to not allow changing autocommit mode when a batch is in progress. Also raising error when starting a ddl batch when connection is not in autocommit mode
tests/system/test_dbapi.py
Outdated
self._cursor.execute("DROP TABLE Table_1") | ||
assert table_2.exists() is True | ||
self._cursor.execute("DROP TABLE Table_2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: combine the two DROP TABLE
statements into one batch, that is more efficient to execute on Spanner. (So first verify that both tables exist, and then drop them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/system/test_dbapi.py
Outdated
@pytest.mark.parametrize("autocommit", [True, False]) | ||
def test_ddl_execute(self, autocommit, dbapi_database): | ||
"""Check that DDL statement results in successful execution for execute | ||
method in autocommit mode while it's a noop in non-autocommit mode.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should include the fact that the connection also uses buffered DDL. And that again means that the DDL statement is not a no-op (I think), but it is being buffered. And the test should then verify tha the statement is actually buffered, and that it is not really a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/system/test_dbapi.py
Outdated
@@ -1196,7 +1232,7 @@ def test_ddl_executemany_autocommit_false(self, dbapi_database): | |||
table = dbapi_database.table("DdlExecuteManyAutocommit") | |||
assert table.exists() is False | |||
|
|||
def test_ddl_execute(self, dbapi_database): | |||
def test_ddl_then_non_ddl_execute(self, dbapi_database): | |||
"""Check that DDL statement followed by non-DDL execute statement in | |||
non autocommit mode results in successful DDL statement execution.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update the comments of these test cases, that they require buffer_ddl=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/system/test_dbapi.py
Outdated
@@ -1339,31 +1375,23 @@ def test_json_array(self, dbapi_database): | |||
op = dbapi_database.update_ddl(["DROP TABLE JsonDetails"]) | |||
op.result() | |||
|
|||
@pytest.mark.noautofixt | |||
def test_DDL_commit(self, shared_instance, dbapi_database): | |||
def test_DDL_commit(self, dbapi_database): | |||
"""Check that DDLs in commit mode are executed on calling `commit()`.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this also is only true if buffer_ddl=True
. In other cases, this would cause an error (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats correct. Added the comment
"""Executes all the buffered statements on the active dml batch by | ||
making a call to Spanner. | ||
|
||
This method is internal and not for public use as it can change anytime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and elsewhere
This method is internal and not for public use as it can change anytime. | |
This method is internal and not for public use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that you have not pushed that change yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did not. Thanks done
If the connection is in auto commit mode then this flag doesn't have any | ||
significance as ddl statements would be executed as they come. | ||
|
||
For connection not in auto commit mode: | ||
If enabled ddl statements would be buffered at client and not executed | ||
at cloud spanner. When a non ddl statement comes or a transaction is | ||
committed then all the existing buffered ddl statements would be executed. | ||
|
||
If disabled then its an error to execute ddl statement in autocommit mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the connection is in auto commit mode then this flag doesn't have any | |
significance as ddl statements would be executed as they come. | |
For connection not in auto commit mode: | |
If enabled ddl statements would be buffered at client and not executed | |
at cloud spanner. When a non ddl statement comes or a transaction is | |
committed then all the existing buffered ddl statements would be executed. | |
If disabled then its an error to execute ddl statement in autocommit mode. | |
This flag determines how DDL statements are handled when auto_commit=False: | |
1. buffer_ddl_statements=True: DDL statements are buffered in the client until the | |
next non-DDL statement, or until the transaction is committed. Executing a | |
non-DDL statement causes the connection to send all buffered DDL statements | |
to Spanner, and then to execute the non-DDL statement. Note that although the | |
DDL statements are sent as one batch to Spanner, they are not guaranteed to be | |
atomic. See https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.admin.database.v1#google.spanner.admin.database.v1.UpdateDatabaseDdlRequest | |
for more details on DDL batches. | |
2. buffer_ddl_statements=False: Executing DDL statements is not allowed and an | |
error will be raised. When the connection is in auto_commit=False mode, a | |
transaction is active, and DDL statements cannot be executed in a transaction. | |
This flag is ignored when auto_commit=True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -152,7 +154,7 @@ def test_commit_exception(self): | |||
try: | |||
self._conn.commit() | |||
except Exception: | |||
pass | |||
self._conn.database._pool._sessions.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, would you mind adding a comment to the code here to explain why we are doing this.
tests/system/test_dbapi.py
Outdated
if auto_commit: | ||
self._conn.autocommit = True | ||
|
||
self._cursor.execute("start batch ddl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that my main takeaway from that is: It should not be possible to change the value of auto_commit
while a batch is in progress. That should be the case for both for DDL batches and DML batches.
Take for example DML batches, and assume the following script:
set auto_commit=True;
start batch dml;
insert into foo (..);
set auto_commit=False;
insert into bar (...);
run batch;
rollback;
What should be the result of the above script?
We are also adding a boolean property
buffer_ddl_statements
(disabled by default) which is changing the behavior of dbapi as currently it behave as per the property being enabled. We need to override the flag to enable it in SqlAlchemy and Django. More details about this flag below:If the connection is in auto commit mode then this flag doesn't have any significance as ddl statements would be executed as they come.
For connection not in auto commit mode:
More details here https://docs.google.com/document/d/1mSVC_AhPpdSA0xUoj4LUt2o2tVM_LDv6IdssiwWnbmw