Adding a utility to provide test data for specific unicode handling issues. #1044

ldhardy · 2025-01-09T20:32:36Z

There are a few places where we need more robust testing that we aren't making false assumptions about what the byte array contains and how (we attempt) to do clever things with it. When our code or the 3rd party libraries we use are defaulting to baseless assumptions about how we can move around in the bytes, we end up mangling the data in ways that isn't always obvious. This helper class and the explaining test should allow us to test more edge cases, and avoid using dependencies that don't handle Unicode well. I envision adding more samples as we come across more cases that keep tripping us up, but for now I need the XML map for one of my tickets, and I have plans for the emoji string for a couple of other tests.

The entirety of this change should be test only, used from the emissary test jar. I picked the icu4j version for consistency but I'd really like to always use the latest, it should be at the top of our list of dependencies to keep updated once we start using the icu4j utilities more often (as we should be).

ldhardy added 13 commits January 7, 2025 18:06

Adding a helper class to provide sample unicode strings for testing.

210a008

Merge branch 'main' into complex-unicode-string-samples

b9fc318

Updating javadoc

6bacd91

Adding in facepalm example

259e08c

Updating documentation and removing icu4j dep

dd3d475

Removing icu4j dep

09abd78

Merge branch 'main' into complex-unicode-string-samples

33ac82f

More updates

dfdbd4c

Updates to test

47559de

Adding in normalization observation.

a17c005

Updating javadoc

661c2b3

Removing comment xml formatting

f90b972

Removing comment xml formatting

34a37a4

jpdahlke added this to the v8.21.0 milestone Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a utility to provide test data for specific unicode handling issues. #1044

Adding a utility to provide test data for specific unicode handling issues. #1044

ldhardy commented Jan 9, 2025 •

edited

Loading

Adding a utility to provide test data for specific unicode handling issues. #1044

Are you sure you want to change the base?

Adding a utility to provide test data for specific unicode handling issues. #1044

Conversation

ldhardy commented Jan 9, 2025 • edited Loading

ldhardy commented Jan 9, 2025 •

edited

Loading