You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IPv4-mapped / IPv4-compatible IPv6 addresses (e.g., ::ffff:192.0.2.128) in URLs are mangled by WaybackURLKeyMaker: the enclosing square brackets are not removed, but moved around together with the parts of the host-port combination after splitting at dots:
jshell> import org.archive.url.WaybackURLKeyMaker;
jshell> var km = new WaybackURLKeyMaker();
jshell> km.makeKey("http://[::ffff:123.123.87.87]:8080/index.html")
$3 ==> "87],87,123,[::ffff:123:8080)/index.html"
For comparison, the Python surt module removes the square brackets before splitting at dots and moving reversing the parts:
$> pip3 show surt
Name: surt
Version: 0.3.1
Summary: Sort-friendly URI Reordering Transform (SURT) python package.
$> python3
Python 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] on linux
>>> from surt import surt
>>> surt("http://[::ffff:123.123.87.87]:8080/index.html")
'87,87,123,::ffff:123:8080)/index.html'
I'm not sure, what the best representation is:
normalize the IPv4-mapped representation - ::ffff:123.123.87.87 becomes
::ffff:7b7b:5757
or 123.123.87.87
the double use of the colon in IPv6 addresses and as port separator is troublesome, but maybe not an issue, because SURT keys are recall-oriented and some ambiguity is acceptable. It'd be also a separate issue.
The text was updated successfully, but these errors were encountered:
IPv4-mapped / IPv4-compatible IPv6 addresses (e.g.,
::ffff:192.0.2.128
) in URLs are mangled by WaybackURLKeyMaker: the enclosing square brackets are not removed, but moved around together with the parts of the host-port combination after splitting at dots:For comparison, the Python surt module removes the square brackets before splitting at dots and moving reversing the parts:
I'm not sure, what the best representation is:
::ffff:123.123.87.87
becomes::ffff:7b7b:5757
123.123.87.87
The text was updated successfully, but these errors were encountered: