Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 archives are not transferable between little and big endian architectures #2110

Open
kurik opened this issue Dec 18, 2024 · 1 comment
Open

Comments

@kurik
Copy link
Contributor

kurik commented Dec 18, 2024

During some extended testing there has been found an issue on machines with big endian architectures using V3 archives from the testsuite. A deeper investigation shows that the problem is in timestamps, which are not stored in the big endian nor little endian format, but some hybrid format, mixing big and little endianness. This does not cause an issue when the same endianness is used for storing as well as restoring/loading data from an V3 archives. However in case a V3 archive is generated on little endian and restored/loaded on big endian architecture (or vise versa), these timestamps are crippled then.

My assumption is, that the aim for V3 archives was to have a format agnostic on endianness, so the so called network format of 64bits integer has been chosen. This is similar to V2 format, which is using the network format of 32bits integer.

The problematic part is implementation of the __htonll function in endian.c file. This function is used by __pmLoadTimestamp, __pmPutTimestamp, etc. to load/save timestamps from/to archives. IMO on big endian architectures the function __htonll shoud do nothing as the byte order is already in the network format. Otherwise we end up with little endian order of bytes in a 64bits integer on big endian architecture, which leads to a crippled value.

One specific example of the issue is this reproducer:

  • Prerequisites: pcp-testsuite is installed
cd /var/lib/pcp/testsuite/archives/multi_v3
PCP_DEBUG=desperate,logmeta pmlogextract ./20150508.11.44 /tmp/discard |& grep -e __pmPutTimestamp | head -n 1

On x86_64 arch (litle endian) this generates the following output:

__pmPutTimestamp: 1431099844.631443000 (554cd9c4 25a30e38 nsec) -> network(c4d94c5500000000 380ea325 nsec)

On s390x arch (big endian) this generates the following output:

__pmPutTimestamp: 1431099844.631443000 (554cd9c4 25a30e38 nsec) -> network(00000000554cd9c4 25a30e38 nsec)

See the difference (byte order) between the first field of the network part.

I have not tested this, but in theory the implementation of the __htonll shoud affects also other parts of PCP where 64bits are used (i.e. communication between pmcd and pmlogger running on systems with different endianness).

@kurik
Copy link
Contributor Author

kurik commented Dec 18, 2024

This is a followup of RHEL-61501.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant