Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding used for JSON output for binary data #263

Open
p-l- opened this issue May 22, 2020 · 7 comments
Open

Encoding used for JSON output for binary data #263

p-l- opened this issue May 22, 2020 · 7 comments

Comments

@p-l-
Copy link

p-l- commented May 22, 2020

Hi,

First, thanks a lot for this project!

I use ZGrab2 to fetch the content of the HTTP answer from the JSON output. I have an issue when that content is binary (e.g., a blob, or encrypted data, etc.): it is somehow encoded to JSON UTF.

However, in Python (this problem is in IVRE) when I json.loads() a line, then try to .encode() the content of the field that contains the HTTP content, I don't get the same value than the original file.

Do you have any idea if there is a bug somewhere, or if there is something wrong in what I do / expect?

Thanks!

@p-l-
Copy link
Author

p-l- commented May 22, 2020

Possibly related to #197.

@p-l-
Copy link
Author

p-l- commented May 22, 2020

Based on initial investigations, it seems that (at least some) "non-printable" characters are replaced by \ufffd (making it impossible to "decode them", since many different characters are replaced by this value).

svbatalov added a commit to svbatalov/zgrab2 that referenced this issue Aug 20, 2021
Conversion of binary responses to UTF8 occasionally yields U+FFFD [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block))
(see zmap#197, zmap#263). As a result it is not possible to restore the original response.

This introduces the `--hex` option to the `banner` module. When enabled,
the `banner` value will contain server response in hex.

Refs zmap#197, zmap#263
dadrian pushed a commit that referenced this issue Aug 29, 2021
Conversion of binary responses to UTF8 occasionally yields U+FFFD [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block))
(see #197, #263). As a result it is not possible to restore the original response.

This introduces the `--hex` option to the `banner` module. When enabled,
the `banner` value will contain server response in hex.

Refs #197, #263

#325
@p-l-
Copy link
Author

p-l- commented Sep 29, 2021

Fixed in #325. Thanks!

@p-l- p-l- closed this as completed Sep 29, 2021
@p-l-
Copy link
Author

p-l- commented Jan 14, 2022

Unfortunately, this option is only valid for the banner module, not the http. Also, it would be great to have a special attribute (e.g., is_hex) so that tools can tell whether the value is encoded as hex or not.

@BuileaTM
Copy link

Unfortunately, this option is only valid for the banner module, not the http. Also, it would be great to have a special attribute (e.g., is_hex) so that tools can tell whether the value is encoded as hex or not.

Does anyone has a sort of workarround for this encoding issue specifically for the http module?

@LloydLabs
Copy link

Unfortunately, this option is only valid for the banner module, not the http. Also, it would be great to have a special attribute (e.g., is_hex) so that tools can tell whether the value is encoded as hex or not.

Does anyone has a sort of work around for this encoding issue specifically for the http module?

I added --encode-response to a fork on zgrab2 which does this, you just need to add this before the return of the getCheckRedirect and Grab functions to ensure the hash is still computed correctly:

res.BodyText = base64.StdEncoding.EncodeToString([]byte(res.BodyText))

Along with the flag at the top of the source file to enable it.

@mzpqnxow
Copy link
Contributor

@p-l- is this still an issue as far as you know? I haven't encountered it so I'm not able to tell

If so, @LloydLabs, would you have time to send the changes in that fork you mentioned as a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants