Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a utility to provide test data for specific unicode handling issues. #1044

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ldhardy
Copy link
Collaborator

@ldhardy ldhardy commented Jan 9, 2025

There are a few places where we need more robust testing that we aren't making false assumptions about what the byte array contains and how (we attempt) to do clever things with it. When our code or the 3rd party libraries we use are defaulting to baseless assumptions about how we can move around in the bytes, we end up mangling the data in ways that isn't always obvious. This helper class and the explaining test should allow us to test more edge cases, and avoid using dependencies that don't handle Unicode well. I envision adding more samples as we come across more cases that keep tripping us up, but for now I need the XML map for one of my tickets, and I have plans for the emoji string for a couple of other tests.

The entirety of this change should be test only, used from the emissary test jar. I picked the icu4j version for consistency but I'd really like to always use the latest, it should be at the top of our list of dependencies to keep updated once we start using the icu4j utilities more often (as we should be).

@jpdahlke jpdahlke added this to the v8.21.0 milestone Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants