Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workload param to MySQL connection string #57

Closed
wants to merge 2 commits into from

Conversation

ricardonunez-io
Copy link

For MySQL users with single-query row limits (i.e. Vitess and/or Planetscale users), the workload session variable allows them to bypass the 100,000 row limit so that DuckDB can properly load tables into memory and/or on disk without manually having to write batch queries to create a table in DuckDB from a table of 100s of thousands or millions of rows.

For MySQL users with single-query row limits (i.e. Vitess and/or Planetscale users), the workload session variable allows them to bypass the 100,000 row limit so that DuckDB can properly load tables into memory and/or on disk without manually having to write batch queries.
@Mytherin
Copy link
Contributor

Mytherin commented Apr 2, 2024

Thanks for the PR!

Perhaps I'm not following entirely, but this seems to just append the workload parameter to the host parameter. Would this not already be possible by connecting using e.g. ATTACH 'mysql:host=www.myhost.com?workload=.. ...' AS mysqldb. Could you expand a bit on why adding support for this specific setting is required/desirable?

@ricardonunez-io
Copy link
Author

ricardonunez-io commented Apr 2, 2024

Hi Mark! Sure thing, currently, the extension looks for a connection string in the format of 'key=value key=value ...', with each key/value pair separated by spaces.

If you try to append session variables to the end of the hostname/database name instead (e.g. ATTACH 'hostname=localhost?workload=olap ...' or ATTACH 'hostname=localhost database=mysql?workload=olap ...'), the key/value pairs are no longer separated strictly by spaces, but rather one of the key/value pairs is attached to another in query-param format, so the current parsing method returns an error as it was expecting a space before the next key=value not a ?. Specifically, it returns Unrecognized configuration parameter "".

Also, for reference, using mysql_execute to run something like SET workload = olap doesn't work either, because this specific session variable is not available in general MySQL (which is how this extension connects), just in the Vitess flavor.

However, I figured adding another connection parameter to the string would be both more simple (does not require a change in connection string-parsing method which is the same, to my knowledge, as the Postgres extension) and self documenting (Vitess/Planetscale users would have the variable visible in the shell).

Otherwise, if it makes more sense, I can submit a PR to change the string parsing method from expecting a series of both space separated key/value pairs and = separated keys and values to splitting the connection string by spaces first, and then again by the first = operator, making the entire second half of the =-split string the connection variable to pass to mysql_real_connect. This will allow for arbitrary session variables to be added, but will make the parsing method separate than that of the Postgres extension, which might be undesirable for other maintainers, not sure.

Also, just saw the tests fail, I'll add a commit to fix that, that was a previous version.

Thank you! And big fan of DuckDB Labs!

`db_with_workload` should've been `config.workload` appended to `config.db`, not `config.host`. Also remove unnecessary `substr(1)` method call on `workload_param`. Both errors were from previous iterations on this PR.
@Mytherin
Copy link
Contributor

Mytherin commented Apr 2, 2024

Thanks - that makes sense. I think reworking the parsing of configuration parameters so that it supports these types of parameters is cleaner. I've recently actually cleaned up the parsing so that it allows quoted values - see #48. Perhaps that already allows this to be supported?

@ricardonunez-io
Copy link
Author

No problem! Ah I didn't see that PR, that does allow this to be supported, yes, but I just tested it out, and it looks like session variables can't be appended to the database name/host this way, it looks like it might have to be done using another mysql C API. (I had vcpkg problems on my machine, even on a freshly cloned repo, and couldn't build it to test locally)

I'll look into it and submit a more general purpose PR for adding arbitrary session variables to the connection string.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants