Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support diffing text "binary" snapshots #708

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PookieBuns
Copy link

Thanks to #610 we have covered a large step in being able to use insta for schema snapshotting as per #475 . However, currently binary files cannot be diffed when running cargo test or cargo insta review. This makes it difficult to compare and review changes when examining schema changes. This PR allows insta to try to utf-8 decode binary files that are text based to allow this subset of files to be diffed in the workflow

@max-sixty
Copy link
Collaborator

There's definitely something useful here; I can see some good cases for using the binary snapshot format for text files; for example when we want a raw file without a header.

I'm a bit concerned that this could lead to some unexpected behavior though:

  • If the binary snapshots are multiple megabytes, how does this behave?
    • We could add a limit, though then could be confusing why some snapshots are presented for a review and some aren't
  • What are the chances of some bytes getting through the String::from_utf8(new.to_vec()).ok())) check and the screen being filled with meaningless characters?

@PookieBuns
Copy link
Author

PookieBuns commented Jan 14, 2025

I think we can draw some inspiration to how diff behaves as part of GNU. Section 1.7 of this manual states

diff determines whether a file is text or binary by checking the first few bytes in the
file; the exact number of bytes is system dependent, but it is typically several thousand. If
every byte in that part of the file is non-null, diff considers the file to be text; otherwise
it considers the file to be binary.

You could also force the diff to compare files as text using the --text (-a) option. I think this would be a good start for this feature, as it would only be opt-in.

I could add an option in the Tool Config File that could enable diffing binary files as text with a behavior.text_diff option that by default is turned off. We could also add this as an option in cargo insta when reviewing files.

In regards to addressing your concerns in the case we make it opt-in

  1. It appears that if you force a text diff on large files (I tested with 2 video files), It will still try to diff them and produce erroneous output (at least this is the behavior on MacOS although its not technically GNU diff).

  2. I think if you do opt in for text_diff, you probably either

    1. Know for sure your files are text
    2. Are prepared to see some erroneous output

@max-sixty
Copy link
Collaborator

max-sixty commented Jan 27, 2025

Yes I think that could be viable!

That said, I'm a bit concerned that we're adding more options to our interface when there's a quite reasonable alternative — accepting the snapshot and running a git diff on the files.

If there's a way of building this so it's either:

  • really well-behaved by default (i.e. doesn't show a mess of characters), or
  • it's a single option in the config file and it doesn't have any really bad behavior (i.e. doesn't freeze on a multi-megabyte file)

...then I would support this

Also ofc open to others' views!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants