-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#991 Support Relative Date Times #1006
base: main
Are you sure you want to change the base?
#991 Support Relative Date Times #1006
Conversation
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/TimeUtils.java
Outdated
Show resolved
Hide resolved
ppl-spark-integration/src/test/java/org/opensearch/sql/expression/function/TimeUtilsTest.java
Outdated
Show resolved
Hide resolved
testValid("@w6", "2000-01-01T00:00"); | ||
testValid("@w7", "2000-01-02T00:00"); | ||
|
||
testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will need to test a lot more invalid cases (such as "@W8") to satisfy implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to see more complicated use cases such as
source=table | WHERE earliest=-5d@w1 AND latest=@w6 | ...
I remembered we will not be supporting this functionality in the first iteration - can you please add this test and mark it as ignore
until we add this capability ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acarbonetto I have added this test case, as well as a few more invalid ones. If you have any more in mind, please let me know.
@YANG-DB. I have added ignored tests for earliest
and latest
in FlintSparkPPLBuiltInDateTimeFunctionITSuite
. That being said, more complex integration tests would, I think, require mocking/overriding the current time in order to be stable. Let me know if you think this is something I should pursue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leverage the timestampdiff
to test more complex cases in IT:
timestampdiff(HOUR, relative_timestamp("+1d"), relative_timestamp("+1h"))
timestampdiff(HOUR, relative_timestamp("-1h@w3"), relative_timestamp("@d"))
And something else, they should be stable.
…unit tests. Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
testValid("@w6", "2000-01-01T00:00"); | ||
testValid("@w7", "2000-01-02T00:00"); | ||
|
||
testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to see more complicated use cases such as
source=table | WHERE earliest=-5d@w1 AND latest=@w6 | ...
I remembered we will not be supporting this functionality in the first iteration - can you please add this test and mark it as ignore
until we add this capability ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add such time based relative dates queries in the example commands doc
Add a new relative time query filter example to the ppl where clause documentation
…sponding unit tests in `SerializableTimeUdfTest`. Add `mockito-inline` to dependencies for `ppl-spark-integration` to allow mocking of current datetime. Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
…st and latest. Signed-off-by: currantw <[email protected]>
Done! Let me know if this is what you had in mind. |
// TODO #957: Support earliest | ||
ignore("test EARLIEST") { | ||
var frame = sql(s""" | ||
| source = $testTable | ||
| | eval earliest_hour_before = earliest(now(), "-1h") | ||
| | eval earliest_now = earliest(now(), "now") | ||
| | eval earliest_hour_after = earliest(now(), "+1h") | ||
| | fields earliest_hour_before, earliest_now, earliest_hour_after | ||
| | head 1 | ||
| """.stripMargin) | ||
assertSameRows(Seq(Row(true), Row(true), Row(false)), frame) | ||
} | ||
|
||
// TODO #957: Support latest | ||
ignore("test LATEST") { | ||
var frame = sql(s""" | ||
| source = $testTable | ||
| | eval latest_hour_before = latest(now(), "-1h") | ||
| | eval latest_now = latest(now(), "now") | ||
| | eval latest_hour_after = latest(now(), "+1h") | ||
| | fields latest_hour_before, latest_now, latest_hour_after | ||
| | head 1 | ||
| """.stripMargin) | ||
assertSameRows(Seq(Row(false), Row(true), Row(true)), frame) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acarbonetto Any idea on better tests for earliest
and latest
that don't require mocking the current time for testing? Ultimately, earliest
and latest
are really only wrappers around relative_timestamp
function (earliest(field_name, "-1h@d")
is equivalent to field_name >= relative_timestamp("-1h@d")
, so it seems fine to me as long as we have pretty robust unit tests for relative timestamp. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are cheap - so it's okay to have duplicate tests.
Since IT tests are mostly focusing on testing the API and integration, I don't thinks that's overly valuable to test the backend logic. Leave that to Unit Tests where mocking is easily done.
If you need to mock the IT test backend, you're probably not doing testing correctly.
Signed-off-by: currantw <[email protected]>
The relative time string has syntax `[+|-]<offset_time_integer><offset_time_unit>@<snap_time_unit>`, and is made up of | ||
two optional components: | ||
* An offset from the current timestamp at the start of query execution, which is composed of a sign (`+` or `-`), an | ||
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr` | |
`offset_time_integer` integer, and `offset_time_unit` a time unit. If the time integer is not specified, it defaults to`1`. For example, `+2hr` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done ✅
* An offset from the current timestamp at the start of query execution, which is composed of a sign (`+` or `-`), an | ||
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr` | ||
corresponds to two hours after the current timestamp, while `-mon` corresponds to one month ago. | ||
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if | |
* A snap-to time using the `@` symbol followed by `snap_time_unit` - a time unit. The snap-to time is applied after the offset (if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done ✅
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr` | ||
corresponds to two hours after the current timestamp, while `-mon` corresponds to one month ago. | ||
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if | ||
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For | |
specified), and rounds the time <i>down</i> to the start of the specified time unit. For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For | ||
example, `@wk` corresponds to the start of the current week (Sunday is considered to be the first day of the week). | ||
|
||
The following offset time units are supported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case-sensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - forgot to mention that! I have added a line a bit further down that "the entire
relative timestamp string is case-insensitive", and also updated two of the example to use capital letters to illustrate this
| Quarters | `q`, `qtr`, `qtrs`, `quarter`, `quarters` | | ||
| Years | `y`, `yr`, `yrs`, `year`, `years` | | ||
|
||
The snap-to time supports all the time units above, as well as the following day of the week time units: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snap-to time supports all the time units above, as well as the following day of the week time units: | |
The snap-to time supports all the time units above, as well as the following day-of-the-week time units: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
| Friday | `w5` | | ||
| Saturday | `w6` | | ||
|
||
The special relative time string `now` for the current timestamp is also supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be mentioned higher up (like on the first line of the usage/description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that now
is calculated once at the start of the query execution, and used for relative datetime calculations for that query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait... can you verify the above comment? If you print out now()
twice, does it give exactly the same result (down to the millisecond)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the mention of now
closer to the start, and specified that the current timestamp is the same for all relative timestamps in the same query. Let me know what you think.
import org.apache.spark.sql.catalyst.expressions.TimestampDiff$; | ||
import org.apache.spark.sql.catalyst.expressions.ToUTCTimestamp$; | ||
import org.apache.spark.sql.catalyst.expressions.UnaryMinus$; | ||
import org.apache.spark.sql.catalyst.expressions.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to have them all listed out explicitly. apache.spark.sql.catalyst probably has hundreds of functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops - reverted (and updated my setting for this project so IntelliJ doesn't keep trying to "help" me out with this...)
} | ||
|
||
String message = String.format("The relative date time unit '%s' is not supported.", offsetUnit); | ||
throw new RuntimeException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use IllegalArgumentException
install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated ✅
Matcher matcher = RELATIVE_PATTERN.matcher(relativeString); | ||
if (!matcher.matches()) { | ||
String message = String.format("The relative date time '%s' is not supported.", relativeString); | ||
throw new RuntimeException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use IllegalArgumentException
install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
testValid("@w6", "2000-01-01T00:00"); | ||
testValid("@w7", "2000-01-02T00:00"); | ||
|
||
testInvalid("@INVALID", "The relative date time unit 'INVALID' is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leverage the timestampdiff
to test more complex cases in IT:
timestampdiff(HOUR, relative_timestamp("+1d"), relative_timestamp("+1h"))
timestampdiff(HOUR, relative_timestamp("-1h@w3"), relative_timestamp("@d"))
And something else, they should be stable.
…eturned, but output from `$CurrentTimestamp` is an `Instant` Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw [email protected]
Description
Adds support for relative date times via the new user-defined method
relative_datetime
.Related Issues
Resolves #991.
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.