Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to parse/format a float/double from/to a hexadecimal literal #1630

Open
dovisutu opened this issue Jan 11, 2020 · 24 comments
Open
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@dovisutu
Copy link

dovisutu commented Jan 11, 2020

Part of #27204

In IEEE 754:2008 part 5.12.3, transfering a float/double from/to an external hexadecimal-significand character sequence representing finite number is requested while we don't have it yet. The pattern is like this: (regex)

[+-]?0[xX](?:[\da-fA-F]*\.[\da-fA-F]+|[\da-fA-F]\.?)(?:[pP][+-]?\d+)?

notice that this is slightly different to the standard based on the discussion below, which talked about the exponent part

which means:

valid invalid
+0x7ff.3edp+1 +7ff.3edp-1
0x7ff.3edp+1 0x7ff.3ede+1
0x7ff.3edp1 0x7ff.3uup1
+0x7ff.3edp1 0x7ff.3ed
+0X7FF.3EDP1 0X7FF.3ED
-0x7ff.3edp1 0x7ff_3edp-1
0x7ff. +-0x7ff.3edp-1
0x7ff 0x7fu.3edp-1
0x.3edp-1 0x.p-1

benefits

  • Easier parsing for native hexadecimal float/double. for example, 0x0.ffp0 is the equivalent of 0.99609375 while using less chars. Same for formatting as it reduces the size of string to transfer.
  • Parsed number is percise in the limit of float/double's limitations. Since both float and double are based on raw bit, transistion from hexadecimal literal to them would be easier and without rounding if they are in the limitation. parsing decimals, on the other hand, would often has to round.
  • It's in IEEE 754:2008/2019 standard, so it's necessary to add it.

proposal APIs

namespace System.Globalization {
    [Flags]
    enum NumberStyles {
        // ...,
        HexFloat = AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign | AllowHexSpecifier | AllowDecimalPoint
        // ,...
    }
}

And let Numbers.ParseDouble/Single(string s, NumberStyles style[, NumberFormatInfo info]) accept Numberstyles.HexFloat (or it without NumberStyles.AllowDecimalPoint) and correctly parse string.

Edit Numbers.FormatDouble/Single(ref ValueStringBuilder sb, double/float value, ReadOnlySpan<char> format, NumberFormatInfo info) so that they can correctly identify X specifier (which is also used for outputting integers in hex) which may have a trailing precision specifier, and correctly format it.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jan 11, 2020
@stephentoub
Copy link
Member

cc: @tannergooding

@Gnbrkm41
Copy link
Contributor

Tracked by #1387?

@tannergooding
Copy link
Member

Thanks for opening this, its good to see additional customer wants for these areas 😄

It is partially tracked by https://github.com/dotnet/corefx/issues/31901 (which is IEEE 754:2008 compliance) and by #1387 (which is IEEE 754:2019 compliance), but those are largely meta issues and the individual proposals will be easier to take through API review.

The format specifier can be broken down into:

sign:           [+-]?
hexIndicator:   0[xX]
hexSignificand: (?:[\da-fA-F]*\.[\da-fA-F]+|[\da-fA-F]+\.?)
decExponent:    [pP][+-]?\d+

The terminal [fFdD]? listed is not actually part of the IEEE specification and should be excluded.
I also updated hexSignificand to clarify that just . is not valid.

The computed value is hexSignificand * 2^decExponent.

So, for example if you have 0x1.234p0 this is:

  • 0x1 == 1
  • 0x234 * 16^-3 == 564 * 16^-3 == 0.1376953125
  • 1.1376953125 * 2^0 == 1.1376953125

@tannergooding
Copy link
Member

Even though this isn't a new API, I believe we still want to take it through API review since it is modifying an existing API.

We would want to check it against the compat bar and make the necessary decisions around what flags would be used to support this functionality and ensure that we wouldn't accidentally introduce any breaking changes, etc.

@tannergooding tannergooding added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics and removed area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jan 11, 2020
@dovisutu
Copy link
Author

I've updated the issue description to show what changes should be made. Free to discuss.

@tannergooding tannergooding added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 12, 2020
@dovisutu dovisutu changed the title Add the ability to parse a hexadecimal number into a float/double Add the ability to parse/format a hexadecimal number into a float/double Mar 13, 2020
@dovisutu dovisutu changed the title Add the ability to parse/format a hexadecimal number into a float/double Add the ability to parse/format a float/double from/to a hexadecimal literal Mar 13, 2020
@dovisutu
Copy link
Author

(Sorry for the mess when I edit the title)

@reflectronic
Copy link
Contributor

Would this change also allow more combinations with AllowHexSpecifier? e.g. combining it with only AllowLeadingSign (without AllowDecimalPoint).

@tannergooding
Copy link
Member

I don't see why it wouldn't, provided it was still valid according to the IEEE requirements.

@dovisutu
Copy link
Author

I don't see why it wouldn't, provided it was still valid according to the IEEE requirements.

Yeah, it makes sense, but that is out of scope - which should be parsed into signed integer, not floating point nunber - so I think it should be proposed in a separate issue, though.

@tannergooding
Copy link
Member

I don't believe its out of scope, as I said as long as its still valid according to the IEEE requirements, it should be fine.

There are "paths" where the decimal point isn't required and so the user should fully be able to specify that hex is allowed but decimal-point is not. It's an advanced scenario, but the number parser already supports it and will continue doing so even if we add support for hex, so there is no reason to block it. The interesting scenario would be whether or not to allow the exponent to be optional and to default to 0 (which would still be valid based on the normal conversion rules, etc).

sign:           [+-]?
hexIndicator:   0[xX]
hexSignificand: (?:[\da-fA-F]*\.[\da-fA-F]+|[\da-fA-F]+\.?)
decExponent:    [pP][+-]?\d+
  • The sign is optional but must be + or -
  • The hex indicator is required and must be 0x or 0X
  • The hex significand is required and must be one of:
    • one or more hex digits
    • one or more hex digits followed by .
    • one or more hex characters followed by . followed by one or more hex digits
    • . followed by one or more hex digits
  • The decimal exponent is required and must be p or P followed by an optional sign followed by one or more decimal digits

@dovisutu
Copy link
Author

Oh, right. numbers like +0x23p1 can also be parsed as float... So this is OK to be valid (if users doesn't want to accept input that has .). You are right. Let Int.Parse() accept it, however, is out of scope.

But about exponent part... I saw an explanation of this being mandatory in the standard because it may create confusion like:

0x1.23e+1

While it clearly only parses into 1 state:(0x1.23e) + 1, but users may see it as a valid number.

But in c(++)'s implementation this is optional:
(strtod)

  • A 0x or 0X prefix, then a sequence of hexadecimal digits (as in isxdigit) optionally containing a period which separates the whole and fractional number parts. Optionally followed by a power of 2 exponent (a p or P character followed by an optional sign and a sequence of hexadecimal digits).

I think when parsing, let the exponent optional is OK, because you can't parse an expression into a floating-point number.

But it should be mandatory when we (possibly?) move this to a literal value in c#/vb/f# to avoid confusion. However, lefting it optional may still make sense because when users are aware of using this, they should already know that the exponent of hexadecimal float is p, not e.
This part I deleted is out of scope and should be considered by the language design group.

@dovisutu
Copy link
Author

dovisutu commented Apr 5, 2020

Updated the description based on the discussion.

@dovisutu
Copy link
Author

Sorry, but anything going on?

@tannergooding
Copy link
Member

This hasn't made it to API review yet. Issues that are critical to .NET 5 are being reviewed first and the general backlog is then reviewed from oldest to newest.

@dovisutu
Copy link
Author

OK, thanks. :D

@tannergooding tannergooding added this to the Future milestone Jun 23, 2020
@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 24, 2020
@terrajobst
Copy link
Contributor

terrajobst commented Jul 24, 2020

Video

  • API wise, the only change is the NumberStyles addition of HexFloat
  • Behavior wise, this changes parsing of float and double to support the x format specifier. This isn't a breaking change because float and double throw when x is passed.
  • We considered adding AllowHexPrefix because integers don't allow 0x as the prefix, while float and double would. However, for floats we want the prefix to be required, so AllowHexPrefix would necessary be a separate bit. Combining it with HexFloat would make it optional. Thus, we can do it later.
namespace System.Globalization
{
    public partial enum NumberStyles
    {
        HexFloat = AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign | AllowHexSpecifier | AllowDecimalPoint
    }
}

@pgovind
Copy link

pgovind commented Nov 20, 2020

Assigning myself to this.

@pgovind pgovind self-assigned this Nov 20, 2020
@danmoseley
Copy link
Member

@dovisutu are you interested in working on this? If not I will label up for grabs.

@dovisutu
Copy link
Author

@dovisutu are you interested in working on this? If not I will label up for grabs.

Well, nope, I don't think I have spare time (or adequate abilities?) to work on this. Fine to label as up-for-grabs.

@Korporal
Copy link

Korporal commented Dec 3, 2022

It seems that that regex might be wrong, I found this one that seems to behave correctly:

0[Xx]([0-9a-fA-F]*\.[0-9a-fA-F]+|[0-9a-fA-F]+\.?)[Pp][-+]?[0-9]+[flFL]?

From here: https://efxa.org/2014/05/10/regular-expressions-for-matching-data-values-in-compiler-lexers/

@tannergooding
Copy link
Member

It's possible its incorrect, but the ultimate logic won't be using a RegEx anyways

@Korporal
Copy link

Korporal commented Dec 6, 2022

It's possible its incorrect, but the ultimate logic won't be using a RegEx anyways

OK, is that because of performance considerations?

@stephentoub
Copy link
Member

Performance is certainly a factor, but the biggest reason is float/double live in System.Private.CoreLib, which can't depend on System.Text.RegularExpressions.dll

@Korporal
Copy link

Korporal commented Dec 6, 2022

Performance is certainly a factor, but the biggest reason is float/double live in System.Private.CoreLib, which can't depend on System.Text.RegularExpressions.dll

That's quite interesting, thanks for explaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests