Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6551] Add DATE_FORMAT function (enabled in MySQL library) #3936

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

suibianwanwank
Copy link
Contributor

f.checkString("date_format(timestamp '1997-10-04 22:23:00.000000', '%c %D %e %f %h')",
"10 4th 4 000000 10",
"VARCHAR NOT NULL");
f.checkString("date_format(timestamp '1997-01-04 22:23:00', '%w %U %u %V %v %X %x')",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are 4 cases testing mysql's weekForYear in two modes.
https://stackoverflow.com/questions/30364141/mysql-convert-yearweek-to-date

Comment on lines 5237 to 5238
// f.checkString("date_format('1999-13-01', '%X %V')",
// null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL supports this writing and will return null if the timestamp type is not declared and the parameter does not satisfy the timestamp format, which seems to be difficult to do in calcite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you need is to use an implicit cast to timestamp if the input is a CHAR data type.
you should be able to support that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, implicit cast to timestamp this can be done in calcite, but if this cast fails, calcite will throw an exception, but for mysql, it will return null.
This is something that I don't think is easy to implement. unless we add a Config that adds a Config in the checksum phase and returns NULL if the implicit cast fails. WDYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the date is incorrect.
In this case indeed you can probably not match exactly the behavior of mysql.
But I think you should still make it a negative test and leave the comment in.

@@ -4919,7 +4920,8 @@ void testBitGetFunc(SqlOperatorFixture f, String functionName) {
}

@Test void testToCharPg() {
final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the default tester doesn't check the results, so I added this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the results by CalciteSqlOperatorTest. In SqlOperatorTest we check the return type.

@suibianwanwank
Copy link
Contributor Author

The local time zone causes incorrect results and I need to check how to fix it.

@mihaibudiu
Copy link
Contributor

If you want deterministic tests I think you should not use the local timezone.

@asolimando
Copy link
Member

If you want deterministic tests I think you should not use the local timezone.

@mihaibudiu is right, tests should be timezone agnostic, see Cassandra tests for an example.

@suibianwanwank
Copy link
Contributor Author

@mihaibudiu @asolimando I found this problem with all of this test classes, if the time zone is changed arbitrarily. For Calender's test, we need to set the time zone at this time like SqlOperatorTest. I'm not sure if this should be fixed in this PR .

@mihaibudiu
Copy link
Contributor

You can see a solution in e.g., #3805, where the timezone is set before the tests and reset afterwards. It's sufficient to make your own tests pass.

@suibianwanwank
Copy link
Contributor Author

You can see a solution in e.g., #3805, where the timezone is set before the tests and reset afterwards. It's sufficient to make your own tests pass.

I've fixed my test by keeping the data consistent with other tests. But I think in some time zones, some tests may not pass if we don't specify the time zone.

@Override public void format(StringBuilder sb, Date date) {
final Calendar calendar = Work.get().calendar;
calendar.setTime(date);
// It exceeds the precision of Calendar, we fill it with 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Milliseconds are usually between 0 and 999.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incorrect function description, it means microsecond here, I'll fix it

@@ -221,6 +245,33 @@ public enum FormatElementEnum implements FormatElement {
sb.append(String.format(Locale.ROOT, "%02d", calendar.get(Calendar.WEEK_OF_YEAR)));
}
},
V("", "The number of the year (Monday as the first day of the week),"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment seems wrong, it is about the week number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, i will fix it

@@ -399,6 +480,27 @@ public enum FormatElementEnum implements FormatElement {
sb.append(work.yyyyFormat.format(date));
}
},
WFY("", "The year for the week where Sunday is the first day of the week") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say "the week of the year"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result returned by this symbol is the number of years, "the week of the year" might not be appropriate

sb.append(String.format(Locale.ROOT, "%04d", calendar.getWeekYear()));
}
},
WFY0("", "The year for the week where Monday is the first day of the week") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem

final Locale originalLocale = Locale.getDefault();

try {
Locale.setDefault(Locale.US);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need to set the locale in the test if these functions are supposed to be deterministic.
Perhaps the function implementation should set the locale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For datetime functions, the implementation should be based on the user's time zone, just as other arbitrary databases support setting time zones. And other test do it.
And other tests have done the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three distinct data types in Calcite:
TIMESTAMP
TIMESTAMP WITH LOCAL TIME ZONE
TIMESTAMP WITH TIME ZONE

You have to specify the type these functions operate on. I think you probably want functions for each data type. The ones for TIMESTAMP and TIMESTAMP WITH TIMEZONE should be deterministic. And the ones for LOCAL TIME ZONE should be implemented in terms of the other ones by setting the timezone first. But you should not rely on the Java system timezone for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this require more testing, including for BC dates.

Comment on lines 5237 to 5238
// f.checkString("date_format('1999-13-01', '%X %V')",
// null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you need is to use an implicit cast to timestamp if the input is a CHAR data type.
you should be able to support that.

"VARCHAR NOT NULL");
f.checkString("date_format(timestamp '1997-10-04 22:23:00', '%V %v %X %x %%')",
"39 40 1997 1997 %",
"VARCHAR NOT NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests with dates before the Gregorian calendar?
I found that some of these fail.
Also, can you say how you validated the outputs?
Are these tested in MySQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws an exception:Illegal TIMESTAMP literal, In the timestamp scenario, this is consistent with mysql's result.
Yes, I do these tests in mysql . In fact, I write these test cases in mysql and get the results, and then add them to test to validate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gregorian Calendar starts in 1582: https://en.wikipedia.org/wiki/Gregorian_calendar
I don't expect that mysql fails in this case.
Perhaps it fails for dates BC.


try {
Locale.setDefault(Locale.US);
f.checkString("date_format(timestamp '2009-10-04 22:23:00', '%W %M %Y')",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 0000-01-01 or BC date support format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this example in mysql.
SELECT DATE_FORMAT('0000-01-01', '%X %V');
return
0000-01
but
SELECT DATE_FORMAT(timestamp '0000-01-01', '%X %V');
return error: Incorrect DATETIME value: '0000-01-01'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws an exception:Illegal TIMESTAMP literal, In the timestamp scenario, this is consistent with mysql's result.

In the Calcite test, this throws an exception because it can't be converted to a timestamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Calcite has its own date standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT DATE_FORMAT(timestamp '0000-01-01', '%X %V');

This failed because '0000-01-01' is not a timestamp. You can try:

SELECT DATE_FORMAT(data '0000-01-01', '%X %V');

Copy link
Member

@caicancai caicancai Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT DATE_FORMAT(timestamp '0000-01-01', '%X %V');

This failed because '0000-01-01' is not a timestamp. You can try:

SELECT DATE_FORMAT(data '0000-01-01', '%X %V');

0000-01-01 is not even a valid date according to the SQL standard. 0000-01-01 is not even a valid date according to the SQL standard. It is difficult to implement the date_format standard in all dialects. Maybe we need to make some decisions

Copy link

@@ -2944,6 +2944,7 @@ In the following:
| b | TIME_SUB(time, interval) | Returns the TIME value that is *interval* before *time*
| b | TIME_TRUNC(time, timeUnit) | Truncates *time* to the granularity of *timeUnit*, rounding to the beginning of the unit
| m o p r | TO_CHAR(timestamp, format) | Converts *timestamp* to a string using the format *format*
| m | DATE_FORMAT(timestamp, format) | Converts *timestamp* to a string using the format *format*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do DATE-FORMAT and TO_CHAR have the same functional description? Are their functions confirmed to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think they're the same. In fact, Mysql does not have a to_char function.

{@code TO_CHAR} is not supported in MySQL, but it is supported in * MariaDB, a variant of MySQL covered by {@link SqlLibrary#MYSQL}.

@@ -4919,7 +4920,8 @@ void testBitGetFunc(SqlOperatorFixture f, String functionName) {
}

@Test void testToCharPg() {
final SqlOperatorFixture f = fixture().withLibrary(SqlLibrary.POSTGRESQL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the results by CalciteSqlOperatorTest. In SqlOperatorTest we check the return type.


try {
Locale.setDefault(Locale.US);
f.checkString("date_format(timestamp '2009-10-04 22:23:00', '%W %M %Y')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT DATE_FORMAT(timestamp '0000-01-01', '%X %V');

This failed because '0000-01-01' is not a timestamp. You can try:

SELECT DATE_FORMAT(data '0000-01-01', '%X %V');

f.checkString("date_format(timestamp '1997-10-04 22:23:00', '%H %k %I %r %T %S %w')",
"22 22 10 10:23:00 PM 22:23:00 00 6",
"VARCHAR NOT NULL");
f.checkString("date_format(timestamp '1999-01-01', '%X %V')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL,This will throw exception ERROR 1525 (HY000) at line 1: Incorrect DATETIME value: '1994-01-01'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed time to simplify testing Case, and in fact, Mysql doesn't support doing that. I'll add time to this test, but for this current test, I'm not sure we can be consistent with Mysql in Calcite

Copy link
Contributor Author

@suibianwanwank suibianwanwank Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the results by CalciteSqlOperatorTest. In SqlOperatorTest we check the return type.

It looks like CalciteSqlOperatorTest is better suited for this test, do I need to modify this test into CalciteSqlOperatorTest as this looks like it needs result checking too

@suibianwanwank suibianwanwank marked this pull request as draft September 3, 2024 11:11
Copy link

github-actions bot commented Dec 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants