Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on binary field data #1

Open
HerbCSO opened this issue Feb 11, 2015 · 11 comments
Open

Fails on binary field data #1

HerbCSO opened this issue Feb 11, 2015 · 11 comments

Comments

@HerbCSO
Copy link

HerbCSO commented Feb 11, 2015

MySQL allows storing "binary" data types. mysqldump outputs these as byte strings. This results in invalid UTF-8 sequences, which cause the following stack trace when the dump file is piped through hasten:

$ time zcat mysqldump.sql.gz | hasten | mysql test_hasten
/usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `complete?'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:36:in `parse_command'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:13:in `execute'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/bin/hasten:16:in `<top (required)>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `load'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `<main>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `eval'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `<main>'

I imagine it would be rather difficult to identify these binary strings and treat the separately in order to avoid this. I guess the other alternative would be to load the file as ASCII 8-bit, but then I'm not sure what would happen with valid UTF-8 characters...

Bit of a pickle, not sure if there is a good solution, but wanted to at least get the issue out there. I was trying it out on one of our dumps to see if it would speed things up on restore, which we sorely need, but this is unfortunately a blocker for using hasten.

@thirtysixthspan
Copy link
Owner

Thanks for the report! If you could make available an example dump that contained such a binary string, I could tackle it. Or if you want to issue a PR I would gladly review it.

@HerbCSO
Copy link
Author

HerbCSO commented Feb 14, 2015

Test file available at https://gist.github.com/HerbCSO/2b4213cdc1e81df121e8 (uuencoded). I'd love to do a PR, but as I mentioned I'm not really sure how to approach it.

To reproduce, once you have the file uudecoded, run cat binary_data.sql | hasten | mysql binary_test. It should produce the following error:

/Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `complete?'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:36:in `parse_command'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:13:in `execute'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/bin/hasten:16:in `<top (required)>'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `load'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `<main>'

@paneq
Copy link

paneq commented Mar 19, 2015

+1

2 similar comments
@martinille
Copy link

+1

@aweis89
Copy link

aweis89 commented Jul 28, 2017

+1

@meme-lord
Copy link

If anyone is interested I've modified it here: #2
It replaces characters causing issues with '?'

@kbresin
Copy link

kbresin commented Sep 10, 2019

adding mysqldump option: --hex-blob seems to be a workaround for this issue at the moment.

@meme-lord
Copy link

I've made a bash script here: https://github.com/meme-lord/hasten_py/blob/master/hasten.sh
That basically does what hasten does. hasten is a lot slower because it parses all the SQL for no real reason. This also has no errors with encoding or binary data.

@kbresin
Copy link

kbresin commented Sep 10, 2019

Interesting script, although that isn't quite what hasten does.

Hasten removes the key definitions on new tables, and then adds them back in at the end via alters.

Your script is interesting, and I think the auto-commit 0 angle is interesting, but for very large databases I'm not sure one commit at the very end is a great idea.

I am probably going to write my own script that disables auto-commit, and then has a configurable number of INSERT lines to add COMMITs after.

To provide a happy medium between:
"COMMIT after every single INSERT"
and
"COMMIT only once at the end!"

Which I think should speed up large table innodb INSERTs considerably, without hasten's sort of complex approach.

@meme-lord
Copy link

That's an excellent idea. I can try implement it the next time I need to do a large import.
I didn't spot the key definition thing last time I read through the source.
Do you think grouping inserts would also give a speedup?
Some SQL dumps I've seen have one insert statement per row.

@kbresin
Copy link

kbresin commented Sep 10, 2019

Do you think grouping inserts would also give a speedup?

It definitely does, but I think that is best handled by mysqldump's --extended-insert flag :

Write INSERT statements using multiple-row syntax that includes several VALUES lists.
This results in a smaller dump file and speeds up inserts when the file is reloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants