Fails on binary field data #1

HerbCSO · 2015-02-11T14:49:43Z

MySQL allows storing "binary" data types. mysqldump outputs these as byte strings. This results in invalid UTF-8 sequences, which cause the following stack trace when the dump file is piped through hasten:

$ time zcat mysqldump.sql.gz | hasten | mysql test_hasten
/usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `match'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/command.rb:6:in `complete?'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:36:in `parse_command'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/lib/hasten/dump.rb:13:in `execute'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/gems/hasten-1.0.0/bin/hasten:16:in `<top (required)>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `load'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/hasten:23:in `<main>'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `eval'
        from /usr/local/rvm/gems/ruby-1.9.3-p484/bin/ruby_executable_hooks:15:in `<main>'

I imagine it would be rather difficult to identify these binary strings and treat the separately in order to avoid this. I guess the other alternative would be to load the file as ASCII 8-bit, but then I'm not sure what would happen with valid UTF-8 characters...

Bit of a pickle, not sure if there is a good solution, but wanted to at least get the issue out there. I was trying it out on one of our dumps to see if it would speed things up on restore, which we sorely need, but this is unfortunately a blocker for using hasten.

The text was updated successfully, but these errors were encountered:

thirtysixthspan · 2015-02-13T00:31:52Z

Thanks for the report! If you could make available an example dump that contained such a binary string, I could tackle it. Or if you want to issue a PR I would gladly review it.

HerbCSO · 2015-02-14T03:58:12Z

Test file available at https://gist.github.com/HerbCSO/2b4213cdc1e81df121e8 (uuencoded). I'd love to do a PR, but as I mentioned I'm not really sure how to approach it.

To reproduce, once you have the file uudecoded, run cat binary_data.sql | hasten | mysql binary_test. It should produce the following error:

/Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `match'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/command.rb:6:in `complete?'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:36:in `parse_command'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/lib/hasten/dump.rb:13:in `execute'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/gems/hasten-1.0.1/bin/hasten:16:in `<top (required)>'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `load'
        from /Users/carsten.dreesbach/.rvm/gems/ruby-1.9.3-p551/bin/hasten:23:in `<main>'

paneq · 2015-03-19T18:00:04Z

+1

martinille · 2016-03-18T20:33:28Z

+1

aweis89 · 2017-07-28T00:43:10Z

+1

meme-lord · 2017-07-29T22:28:35Z

If anyone is interested I've modified it here: #2
It replaces characters causing issues with '?'

kbresin · 2019-09-10T13:55:43Z

adding mysqldump option: --hex-blob seems to be a workaround for this issue at the moment.

meme-lord · 2019-09-10T15:21:03Z

I've made a bash script here: https://github.com/meme-lord/hasten_py/blob/master/hasten.sh
That basically does what hasten does. hasten is a lot slower because it parses all the SQL for no real reason. This also has no errors with encoding or binary data.

kbresin · 2019-09-10T15:37:46Z

Interesting script, although that isn't quite what hasten does.

Hasten removes the key definitions on new tables, and then adds them back in at the end via alters.

Your script is interesting, and I think the auto-commit 0 angle is interesting, but for very large databases I'm not sure one commit at the very end is a great idea.

I am probably going to write my own script that disables auto-commit, and then has a configurable number of INSERT lines to add COMMITs after.

To provide a happy medium between:
"COMMIT after every single INSERT"
and
"COMMIT only once at the end!"

Which I think should speed up large table innodb INSERTs considerably, without hasten's sort of complex approach.

meme-lord · 2019-09-10T16:16:08Z

That's an excellent idea. I can try implement it the next time I need to do a large import.
I didn't spot the key definition thing last time I read through the source.
Do you think grouping inserts would also give a speedup?
Some SQL dumps I've seen have one insert statement per row.

kbresin · 2019-09-10T18:53:15Z

Do you think grouping inserts would also give a speedup?

It definitely does, but I think that is best handled by mysqldump's --extended-insert flag :

Write INSERT statements using multiple-row syntax that includes several VALUES lists.
This results in a smaller dump file and speeds up inserts when the file is reloaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails on binary field data #1

Fails on binary field data #1

HerbCSO commented Feb 11, 2015

thirtysixthspan commented Feb 13, 2015

HerbCSO commented Feb 14, 2015

paneq commented Mar 19, 2015

martinille commented Mar 18, 2016

aweis89 commented Jul 28, 2017

meme-lord commented Jul 29, 2017

kbresin commented Sep 10, 2019

meme-lord commented Sep 10, 2019

kbresin commented Sep 10, 2019

meme-lord commented Sep 10, 2019

kbresin commented Sep 10, 2019 •

edited

Loading

Fails on binary field data #1

Fails on binary field data #1

Comments

HerbCSO commented Feb 11, 2015

thirtysixthspan commented Feb 13, 2015

HerbCSO commented Feb 14, 2015

paneq commented Mar 19, 2015

martinille commented Mar 18, 2016

aweis89 commented Jul 28, 2017

meme-lord commented Jul 29, 2017

kbresin commented Sep 10, 2019

meme-lord commented Sep 10, 2019

kbresin commented Sep 10, 2019

meme-lord commented Sep 10, 2019

kbresin commented Sep 10, 2019 • edited Loading

kbresin commented Sep 10, 2019 •

edited

Loading