-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locale-invariant parsing and formatting #251
Conversation
2b7d3fd
to
22c22c0
Compare
cb6c237
to
af881b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for your nice work. I'm wondering whether there are any reference documents for this String format? Or is it created just for kotlinx.datetime
library? As a user, it might be tooooooo complex to have to learn specific syntax separately in order to just use a datetime library. And that makes it inconsistent with other languages.
For example, I hove my formatter yyyy-MM-dd HH:mm:ss
which works in Java/JavaScript/Python/MySQL/etc. also works here, instead of changing it to "ld<yyyy'-'mm'-'dd> lt<hh:mm:ss>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it created just for kotlinx.datetime library?
Yes.
yyyy-MM-dd HH:mm:ss
This is the Unicode format: https://unicode-org.github.io/icu/userguide/format_parse/datetime/#datetime-format-syntax It is ubiquitous, as you say, but it has a fair share of issues. For example, many confuse YYYY
with yyyy
(or with uuuu
) and hh
with HH
. This leads to real issues that we try to avoid by providing a more intuitive and clear format.
However, we do plan to provide a way to construct a formatter using the Unicode format strings and some others, for the purposes of graceful migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for replying. I have to admit that the current Unicode version dose confuse sometimes, and maybe improving it is a good advice. And will it better to discuss with community about the new norm while (or before) continuing writing the code? I mean, it is a hard work because most people need to agree to the new style and gradually spread it (to other languages, or even to the new standard). (Kotlin's user is relatively not so much, especially for Kotlin Multiplatform now)
Hope everything works better in the future.
internal class DayDirective(minDigits: Int) : | ||
UnsignedIntFieldFormatDirective<DateFieldContainer>(DateFields.dayOfMonth, minDigits) | ||
|
||
internal object DateFormatBuilderSpec: BuilderSpec<DateFieldContainer>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it support some common used formats like Y, M, W/w
defined in Java's DateTimeFormatter in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M
is just the month, formatting months is certainly a must-have. Regarding the other ones you listed, they are not at all common. Y
and w
are for week-based years (a representation of dates so niche that no one has even asked us to support it yet), and W
(week of month) is only used four times across thousands of repositories in which we checked the usage of format strings (for comparison, the day-of-month directive was used more than 20_000
times).
If you need Y
, W
, or w
, whether for formatting or as fields of LocalDate
, please file an issue describing your use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for detailed explanation, got it. I don't have the use cases, and it might be enough of what we already have.
5de7e01
to
5b0118c
Compare
put(LocalDateTime(-123456, 1, 1, 13, 44, 0, 0), ("-123456-01-01 13:44:00" to setOf())) | ||
} | ||
test(LocalDateTimeFormat.fromFormatString("ld<yyyy'-'mm'-'dd> lt<hh:mm:ss>"), dateTimes) | ||
test(LocalDateTimeFormat.build { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while looking on this test, I was thinking about API shape:
- why not to just use something like this
LocalDateTimeFormat { ... }
for building format? Similar to how kotlinx.serialization create formats (Json { ... }
) or buildingSerializersModule
. Future stdlibHexFormat
API also uses the same convention https://github.com/JetBrains/kotlin/blob/d3eef139c7b316aa9733f0272c5f8a5f8f844d30/libraries/stdlib/test/text/BytesHexFormatTest.kt#L28 - What is the reason to add
append
to all methods? I mean, if looking fromformatter
perspective it looks logical (because when formatting, we append elements, so it's ok), but if looking fromparser
perspective it looks odd. IMO, just droppingappend
word looks good from first sight, because then DSL looks like you are just describing the schema for formatting.
So if to apply my comments, creating format for this test will look:
LocalDateTimeFormat {
year(4)
literal('-')
monthNumber(2)
literal('-')
dayOfMonth(2)
literal(' ')
hour(2)
literal(':')
minute(2)
literal(':')
second(2)
}
If Im missing something, I would be interested to understand :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is said in the opening post,
Most importantly: the naming and the shape of user-visible API needs to be discussed.
So, you may well be right, and I can't give an authoritative answer, just explain the reasoning used in the draft. So far, the naming is this way because we'll want to extend the API with localized formats: #253
why not to just use something like this
LocalDateTimeFormat { ... }
for building format
I think this format implies that every LocalDateTimeFormat
can be constructed with this builder, but this may change in the future. I think it doesn't make sense to mix machine communication formats with human-readable formats in a single constructor, given how disparate the use cases for them are, but it also probably would be excessive to have an entirely separate LocalDateTimeLocalizedFormat
class. Two distinct functions, build
and buildLocalized
, seem disparate enough.
Also, we don't have API like String { +"x"; +"y" }
. Instead, it's buildString { append("x"); append("y") }
.
What is the reason to add
append
to all methods?
It's append
ing to the format. First, the format is "four-digit year," then, it's "four-digit year, followed by a -
," and so on. It's called append
to signify that the order is important. When showing a message to the user, you want to reorder the format components, but when parsing logs, you very much don't. If it's appendYear(4); appendLiteral('-'); appendMonth(2)
, it's clear that it won't be reordered to appendMonth(2); appendLiteral('-'); appendYear(4)
. If it was just year
, literal
, and month
, I wouldn't bet one way or the other.
Amazing piece of work. You could get meantime valuable feedback about API/bugs and community might help to add more features. |
I have a few questions. How is this library going to support other calendars (like Solar calendar)? JavaScript supports formatting dates to other calendars based on the given locale: let formatted = new Date().toLocaleDateString('fa'); See https://stackoverflow.com/q/35570884/8583692 Also, does it correctly output digits based on the locale? I'm saying this because I've seen this problem in other places too. |
"locale-invariant" means that locales are not supported. This implementation is for machine-machine communications, like parsing logs or formatting dates in a specific format for API endpoints. For a discussion of localized formatting capabilities, see #253 |
Status: the API was discussed, and the changes are implemented. The only thing left is to decide on the names of functions and such, with maybe small tweaks here and there. This PR has names like |
a7fbcc4
to
1069917
Compare
532d49c
to
684a03d
Compare
6214827
to
30b182b
Compare
972231d
to
48e1d9f
Compare
second() | ||
optional { | ||
char('.') | ||
secondFractionInternal(1, 9, FractionalSecondDirective.GROUP_BY_THREE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks concerning that it is not possible to construct a standard format without resorting to an internal function. Or is there a public alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed stage 1 of review: public API and tests.
So far the most concerning part is the usage of internal functions when building the standard formats. This would make it hard for users to copy-paste and tweak them.
* https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/time/format/DateTimeFormatterBuilder.html#appendPattern(java.lang.String) | ||
* https://unicode-org.github.io/icu/userguide/format_parse/datetime/#datetime-format-syntax | ||
*/ | ||
internal sealed interface UnicodeFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking this PR, but I wonder, would it be possible to replace such data structure with a single visitor if it is only used once to build our format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, sure, via Church encoding, it's always possible for non-polymorphic types.
internal interface UnicodeFormatInterpreter<Result> {
fun literal(string: String): Result
fun sequence(listCollector: List<Result>): Result
fun optionalGroup(inner: Result): Result
}
internal object UnicodeFormatInterpreterInDateTimeFormatBuilder: UnicodeFormatInterpreter<(DateTimeFormatBuilder) -> Unit> {
override fun literal(string: String): (DateTimeFormatBuilder) -> Unit = { it.chars(string) }
override fun optionalGroup(inner: (DateTimeFormatBuilder) -> Unit): (DateTimeFormatBuilder) -> Unit = {
it.alternativeParsing({}) { inner(this) }
}
override fun sequence(listCollector: List<(DateTimeFormatBuilder) -> Unit>): (DateTimeFormatBuilder) -> Unit = {
listCollector.forEach { subFormat -> subFormat(it) }
}
}
Or we can recreate the standard mutable approach to visitors that allows us to reify the stack and avoid all the lambdas:
internal interface UnicodeFormatInterpreter<State> {
fun literal(state: State, string: String)
fun enterOptionalGroup(state: State)
fun leaveOptionalGroup(state: State)
}
internal class UnicodeFormatInterpreterInDateTimeFormatBuilder<T, E: AbstractDateTimeFormatBuilder<T, E>>:
UnicodeFormatInterpreter<MutableList<AbstractDateTimeFormatBuilder<T, E>>>
{
override fun literal(state: MutableList<AbstractDateTimeFormatBuilder<T, E>>, string: String) {
state.last().chars(string)
}
override fun enterOptionalGroup(state: MutableList<AbstractDateTimeFormatBuilder<T, E>>) {
state.add(state.last().createEmpty())
}
override fun leaveOptionalGroup(state: MutableList<AbstractDateTimeFormatBuilder<T, E>>) {
val main = state.removeLast().actualBuilder.build()
val others = listOf(ConcatenatedFormatStructure(emptyList<NonConcatenatedFormatStructure<T>>()))
state.last().actualBuilder.add(AlternativesParsingFormatStructure(main, others))
}
}
/** | ||
* The fractional part of the second without the leading dot. | ||
* | ||
* When formatting, the decimal fraction will round the number to fit in the specified [maxLength] and will add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth to mention rounding mode used in secondFraction
overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those overloads are internal, so no reason to mention them from the public documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see, secondFraction
overloads are public here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also may be better "the decimal fraction will be rounded"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean these overloads. Sure, added the mentions and improved the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still is worth to mention the rounding mode. HALF_UP, I suppose?
b9cb76c
to
db723fa
Compare
Ok, I removed this functionality from the standard formats and also filed an issue to investigate whether we should also stop doing this in |
@Benchmark | ||
fun formatIso() = LocalDateTime.Formats.ISO.format(localDateTime) | ||
|
||
@Benchmark | ||
fun parseIso() = LocalDateTime.Formats.ISO.parse(formatted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful to commit the benchmark set in this state?
Also, I'd prefer if we dogfood kotlinx-benchmark here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do run them occasionally.
kotlinx-benchmark
Maybe we could do that, but I think it's a problem for after we've published a release with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just know that kotlinx-benchmark is not ready outside of JVM. Too many issues still, especially in JS.
But I'd be happy to see it adopted here, it may speed up the resolution of such issues.
Before this change, we had two `parse` overloads: one that accepted an ISO string and one that accepted some string and the corresponding formatter. Now, we only have one overload that uses the ISO format by default. As a workaround for KT-65484, we have to hide the actual default parameters behind a redundant `internal` function.
Before this change, you'd observe `date(LocalDate.Formats.ISO)char('T')`, without a newline in-between.
Also check that Java returns the same string as toUnicodePattern()
…ument the remaining differences
Merged in #343 |
Fixes #39
Fixes #58
Fixes #90
Fixes #128
Fixes #133
Fixes #139
Fixes #211
Fixes #240
Fixes #83
Fixes #276