Adding a utility to provide test data for specific unicode handling issues. #1044
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are a few places where we need more robust testing that we aren't making false assumptions about what the byte array contains and how (we attempt) to do clever things with it. When our code or the 3rd party libraries we use are defaulting to baseless assumptions about how we can move around in the bytes, we end up mangling the data in ways that isn't always obvious. This helper class and the explaining test should allow us to test more edge cases, and avoid using dependencies that don't handle Unicode well. I envision adding more samples as we come across more cases that keep tripping us up, but for now I need the XML map for one of my tickets, and I have plans for the emoji string for a couple of other tests.
The entirety of this change should be test only, used from the emissary test jar. I picked the icu4j version for consistency but I'd really like to always use the latest, it should be at the top of our list of dependencies to keep updated once we start using the icu4j utilities more often (as we should be).