-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Float encoding #2
Comments
So, having a way to do base62 encoding of floats has struck my mind, but I've disregarded it since I wanted to use JavaScript's float generator and parser. I didn't consider the fact that JavaScript lets you convert floats to strings with different bases. I think the base 10 syntax has to stay. That doesn't necessarily mean we can't add a base 36 (or base 62 or anything else) in addition, but here are the reasons why we can't replace the base 10 syntax with a base 36 syntax:
I implemented a naïve change which just replaces the current
Now, there is a use-case for a more efficient float encoding. Storing the state of a physics-based game, for example, would have a lot of entities with position and velocity vectors which contain values that are essentially picked arbitrarily from the real number line without regard for any base. And base 36 or 62 floats would certainly be an advantage there. I wouldn't be against adding a new float syntax, maybe one which works like the base 62 integer syntax today; |
I was wondering about the same thing. Here's one idea I had, maybe it would work for your usecases? If it clashes with some design nuance I missed in my brief exposure to the library feel free to disregarded. Since we already have the possibility of exponential notation via But we can do better: we can get rid of the So I have a working but totally unsafe implementation here: https://observablehq.com/d/e3010313d892b36e (unsafe as in "it does not test and bail for invalid string input and probably can result in infinite loops as a result". But you already have a B62 parser yourself so that's the hard part handled)
|
@mortie, thanks for the detailed response! Your reasoning makes a lot of sense.
Well, this doesn't entirely hold for base 62 integers either, but given that in JS there's no difference between integers and floats, I suppose at the moment the float notation works as a makeshift solution for that :) I respect that any change should be justified by both a use-case and data, and that the present notation is good enough for what JCOF sets out to achieve -- though as you said, there are some cases where a lot of floats might need to be passed around, and there's not currently a dataset among benchmarks that could represent that case, so perhaps one could inform design decisions. @JobLeonard, I really like your proposals! If I might riff on them a bit: Reusing the existing base 62 encoder/decoder makes a lot of sense. I'd've tried it too; the base 36 was meant as an illustrative example, and doing either of your proposals fits much better along with rest of JCOF. For the first proposal, an equivalent of I like the sublimity of the second one, given that it's essentially just a compressed form of what JCOF does at the moment :) Either way, all of these solutions (both the current decimal scheme, naïve base 36, as well as both proposed by Job) suffer to some extent from the base-incompatibility problem if trying to encode numbers like That said, just truncating after some |
A proper float to string algorithm doesn't just choose some number of sufficient digits; the string My problem is that I only know about these challenges, not how they're solved, or even how difficult they are to solve. FWIW, I tried out the algorithms in https://observablehq.com/d/e3010313d892b36e, and a lot of numbers aren't round-tripped correctly. For example, the base 10 float Here's a snippet which can be used to test all the floats: // Return the next representable double from value towards direction
// From https://stackoverflow.com/a/27661477
function nextNearest(value) {
if (!isFinite(value))
return value;
var buffer = new ArrayBuffer(8);
var f64 = new Float64Array(buffer);
var u32 = new Uint32Array(buffer);
f64[0] = value;
if (value === 0) {
u32[0] = 1;
u32[1] = 0;
} else if ((value > 0) && (value < Infinity) || (value < 0) && (value > Infinity)) {
if (u32[0]++ === 0xFFFFFFFF)
u32[1]++;
} else {
if (u32[0]-- === 0)
u32[1]--;
}
return f64[0];
}
let f = 0;
while (f != Infinity) {
let str = stringifyF62(f);
let parsed = parseF62(str);
if (parsed == f) {
console.log("OK:", f, "from string", str);
} else {
console.log("ERR: expected", f, "got", parsed, "from string", str);
}
f = nextNearest(f);
} Replace with |
@mortie yeah, after thinking a bit more about the "v1" approach that checks out - I do a multiplication which can lead to rounding errors. The second version that first uses |
Ok so first, thanks for that test function, that's been a major help in debugging! There's one major flaw in the implementation though: it allocates new TypedArrays every call 😱. Anyway, that took longer than a few minutes because I bumped into two edge-cases:
The first point wasn't too much of a pain, since it was basically just adapting the ideas from before but in a way that guarantees that we'll never have longer strings in the end. The second point took more time. If I naively convert this the leading zeroes get dropped, resulting in To keep the output compact my solution was to introduce a second separator to indicate leading zeros:
I don't think it's possible for any double to be converted to a number with a fraction with 62 leading zeroes, and vice versa a handcrafted number string with 62 leading zeroes would not convert to a double with the same value. In that case this I would consider this safe (I'm still letting the tests run in the background to verify this, it's taking forever even with that TypedArray hoisting). Oh, and I also added a simple check to see if the converted string is longer than a plain |
(updated code can be found at the same link as before: https://observablehq.com/d/e3010313d892b36e) |
Ok, dammit, found another roundtrip bug:
The cause is that I think that's easy enough to fix though, just have to parse the separate integer, fraction and exponent strings as |
Ok so this is as much as I'll write today - I already got nerdsniped on my free Saturday for too long now, although I had fun doing it :). I'm sure there is still an edge-case I missed, but I think it pretty much works now. Updated code still at https://observablehq.com/d/e3010313d892b36e. To summarize, the notation logic is as follows:
Essentially, Currently |
... actually, given how I currently coded it we could replace |
At the risk of causing bikeshedding, I wanted to raise a point of discussion regarding how floats are encoded -- given that integers are encoded in Base62 and therefore take up less space than their decimal form, it seems a bit odd that floats are left as-is.
My first idea was that they could also be encoded in a similar compacted form, such as base36, which JavaScript can do natively with
.toString(36)
, but that isn't particularly great if the float's magnitude is not close to 0 because the expontente+n
notation can't be used unambiguously, and parsing them back from such a format probably isn't easy (definitely not in JavaScript.)I'd love to hear more about the rationale behind this decision (if any)! All in all, for the goals that it sets out to accomplish, JCOF is really neat.
(Edit, this did end up causing bikeshedding, especially in the other issues, sorry for that.)
The text was updated successfully, but these errors were encountered: