Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#991 Support Relative Date Times #1006

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

currantw
Copy link
Contributor

@currantw currantw commented Jan 3, 2025

Signed-off-by: currantw [email protected]

Description

Adds support for relative date times via the new user-defined method relative_datetime.

Related Issues

Resolves #991.

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

testValid("@w6", "2000-01-01T00:00");
testValid("@w7", "2000-01-02T00:00");

testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to test a lot more invalid cases (such as "@W8") to satisfy implementation.

Copy link
Member

@YANG-DB YANG-DB Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to see more complicated use cases such as
source=table | WHERE earliest=-5d@w1 AND latest=@w6 | ...
I remembered we will not be supporting this functionality in the first iteration - can you please add this test and mark it as ignore until we add this capability ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acarbonetto I have added this test case, as well as a few more invalid ones. If you have any more in mind, please let me know.

@YANG-DB. I have added ignored tests for earliest and latest in FlintSparkPPLBuiltInDateTimeFunctionITSuite. That being said, more complex integration tests would, I think, require mocking/overriding the current time in order to be stable. Let me know if you think this is something I should pursue.

Copy link
Member

@LantaoJin LantaoJin Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leverage the timestampdiff to test more complex cases in IT:
timestampdiff(HOUR, relative_timestamp("+1d"), relative_timestamp("+1h"))
timestampdiff(HOUR, relative_timestamp("-1h@w3"), relative_timestamp("@d"))
And something else, they should be stable.

testValid("@w6", "2000-01-01T00:00");
testValid("@w7", "2000-01-02T00:00");

testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported.");
Copy link
Member

@YANG-DB YANG-DB Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to see more complicated use cases such as
source=table | WHERE earliest=-5d@w1 AND latest=@w6 | ...
I remembered we will not be supporting this functionality in the first iteration - can you please add this test and mark it as ignore until we add this capability ...

Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add such time based relative dates queries in the example commands doc

Add a new relative time query filter example to the ppl where clause documentation

…sponding unit tests in `SerializableTimeUdfTest`. Add `mockito-inline` to dependencies for `ppl-spark-integration` to allow mocking of current datetime.

Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
@currantw
Copy link
Contributor Author

currantw commented Jan 9, 2025

please also add such time based relative dates queries in the example commands doc

Add a new relative time query filter example to the ppl where clause documentation

Done! Let me know if this is what you had in mind.

Comment on lines 397 to 421
// TODO #957: Support earliest
ignore("test EARLIEST") {
var frame = sql(s"""
| source = $testTable
| | eval earliest_hour_before = earliest(now(), "-1h")
| | eval earliest_now = earliest(now(), "now")
| | eval earliest_hour_after = earliest(now(), "+1h")
| | fields earliest_hour_before, earliest_now, earliest_hour_after
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(true), Row(true), Row(false)), frame)
}

// TODO #957: Support latest
ignore("test LATEST") {
var frame = sql(s"""
| source = $testTable
| | eval latest_hour_before = latest(now(), "-1h")
| | eval latest_now = latest(now(), "now")
| | eval latest_hour_after = latest(now(), "+1h")
| | fields latest_hour_before, latest_now, latest_hour_after
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(false), Row(true), Row(true)), frame)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acarbonetto Any idea on better tests for earliest and latest that don't require mocking the current time for testing? Ultimately, earliest and latest are really only wrappers around relative_timestamp function (earliest(field_name, "-1h@d") is equivalent to field_name >= relative_timestamp("-1h@d"), so it seems fine to me as long as we have pretty robust unit tests for relative timestamp. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are cheap - so it's okay to have duplicate tests.

Since IT tests are mostly focusing on testing the API and integration, I don't thinks that's overly valuable to test the backend logic. Leave that to Unit Tests where mocking is easily done.

If you need to mock the IT test backend, you're probably not doing testing correctly.

Signed-off-by: currantw <[email protected]>
build.sbt Outdated Show resolved Hide resolved
The relative time string has syntax `[+|-]<offset_time_integer><offset_time_unit>@<snap_time_unit>`, and is made up of
two optional components:
* An offset from the current timestamp at the start of query execution, which is composed of a sign (`+` or `-`), an
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`
`offset_time_integer` integer, and `offset_time_unit` a time unit. If the time integer is not specified, it defaults to`1`. For example, `+2hr`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done ✅

* An offset from the current timestamp at the start of query execution, which is composed of a sign (`+` or `-`), an
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`
corresponds to two hours after the current timestamp, while `-mon` corresponds to one month ago.
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if
* A snap-to time using the `@` symbol followed by `snap_time_unit` - a time unit. The snap-to time is applied after the offset (if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done ✅

optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`
corresponds to two hours after the current timestamp, while `-mon` corresponds to one month ago.
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For
specified), and rounds the time <i>down</i> to the start of the specified time unit. For

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For
example, `@wk` corresponds to the start of the current week (Sunday is considered to be the first day of the week).

The following offset time units are supported:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case-sensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - forgot to mention that! I have added a line a bit further down that "the entire
relative timestamp string is case-insensitive", and also updated two of the example to use capital letters to illustrate this

| Quarters | `q`, `qtr`, `qtrs`, `quarter`, `quarters` |
| Years | `y`, `yr`, `yrs`, `year`, `years` |

The snap-to time supports all the time units above, as well as the following day of the week time units:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The snap-to time supports all the time units above, as well as the following day of the week time units:
The snap-to time supports all the time units above, as well as the following day-of-the-week time units:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

| Friday | `w5` |
| Saturday | `w6` |

The special relative time string `now` for the current timestamp is also supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be mentioned higher up (like on the first line of the usage/description).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that now is calculated once at the start of the query execution, and used for relative datetime calculations for that query.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... can you verify the above comment? If you print out now() twice, does it give exactly the same result (down to the millisecond)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... can you verify the above comment? If you print out now() twice, does it give exactly the same result (down to the millisecond)?

Yes, down to the millisecond:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the mention of now closer to the start, and specified that the current timestamp is the same for all relative timestamps in the same query. Let me know what you think.

import org.apache.spark.sql.catalyst.expressions.TimestampDiff$;
import org.apache.spark.sql.catalyst.expressions.ToUTCTimestamp$;
import org.apache.spark.sql.catalyst.expressions.UnaryMinus$;
import org.apache.spark.sql.catalyst.expressions.*;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have them all listed out explicitly. apache.spark.sql.catalyst probably has hundreds of functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops - reverted (and updated my setting for this project so IntelliJ doesn't keep trying to "help" me out with this...)

}

String message = String.format("The relative date time unit '%s' is not supported.", offsetUnit);
throw new RuntimeException(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IllegalArgumentException install

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated ✅

Matcher matcher = RELATIVE_PATTERN.matcher(relativeString);
if (!matcher.matches()) {
String message = String.format("The relative date time '%s' is not supported.", relativeString);
throw new RuntimeException(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IllegalArgumentException install

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

testValid("@w6", "2000-01-01T00:00");
testValid("@w7", "2000-01-02T00:00");

testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported.");
Copy link
Member

@LantaoJin LantaoJin Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leverage the timestampdiff to test more complex cases in IT:
timestampdiff(HOUR, relative_timestamp("+1d"), relative_timestamp("+1h"))
timestampdiff(HOUR, relative_timestamp("-1h@w3"), relative_timestamp("@d"))
And something else, they should be stable.

…eturned, but output from `$CurrentTimestamp` is an `Instant`

Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support Relative Date-time Strings
4 participants