Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#991 Support Relative Date Times #1006

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ lazy val pplSparkIntegration = (project in file("ppl-spark-integration"))
"com.github.sbt" % "junit-interface" % "0.13.3" % "test",
"org.projectlombok" % "lombok" % "1.18.30",
"com.github.seancfoley" % "ipaddress" % "5.5.1",
"org.mockito" % "mockito-inline" % "4.6.0" % "test",
currantw marked this conversation as resolved.
Show resolved Hide resolved
),
libraryDependencies ++= deps(sparkVersion),
// ANTLR settings
Expand Down
5 changes: 5 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -494,4 +494,9 @@ _- **Limitation: another command usage of (relation) subquery is in `appendcols`
- `source = table | eval cdate = CAST('2012-08-07' as date), ctime = cast('2012-08-07T08:07:06' as timestamp) | fields cdate, ctime`
- `source = table | eval chained_cast = cast(cast("true" as boolean) as integer) | fields chained_cast`

#### **relative_timestamp**
[See additional function details](functions/ppl-datetime#RELATIVE_TIMESTAMP)
- `source = table | eval one_hour_ago = relative_timestamp("-1h") | where timestamp < one_hour_ago`
- `source = table | eval start_of_today = relative_timestamp("@d") | where timestamp > start_of_today`
- `source = table | eval last_saturday = relative_timestamp("-1d@w6") | where timestamp >= last_saturday`
---
83 changes: 83 additions & 0 deletions docs/ppl-lang/functions/ppl-datetime.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,89 @@ Example:
| 3 |
+-------------------------------+

### `RELATIVE_TIMESTAMP`

**Description:**


**Usage:** relative_timestamp(str) returns a timestamp corresponding to the give relative string and the current
timestamp at the time of query execution.

The relative time string has syntax `[+|-]<offset_time_integer><offset_time_unit>@<snap_time_unit>`, and is made up of
two optional components:
* An offset from the current timestamp at the start of query execution, which is composed of a sign (`+` or `-`), an
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optional time integer, and a time unit. If the time integer is not specified, it defaults to one. For example, `+2hr`
`offset_time_integer` integer, and `offset_time_unit` a time unit. If the time integer is not specified, it defaults to`1`. For example, `+2hr`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done ✅

corresponds to two hours after the current timestamp, while `-mon` corresponds to one month ago.
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A snap-to time using the `@` symbol followed by a time unit. The snap-to time is applied after the offset (if
* A snap-to time using the `@` symbol followed by `snap_time_unit` - a time unit. The snap-to time is applied after the offset (if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done ✅

specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specified), and rounds the time <i>down</i> to the start of the specified time unit (i.e. backwards in time). For
specified), and rounds the time <i>down</i> to the start of the specified time unit. For

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

example, `@wk` corresponds to the start of the current week (Sunday is considered to be the first day of the week).

The following offset time units are supported:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case-sensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - forgot to mention that! I have added a line a bit further down that "the entire
relative timestamp string is case-insensitive", and also updated two of the example to use capital letters to illustrate this


| Time Unit | Supported Keywords |
|-----------|-------------------------------------------|
| Seconds | `s`, `sec`, `secs`, `second`, `seconds` |
| Minutes | `m`, `min`, `mins`, `minute`, `minutes` |
| Hours | `h`, `hr`, `hrs`, `hour`, `hours` |
| Days | `d`, `day`, `days` |
| Weeks | `w`, `wk`, `wks`, `week`, `weeks` |
| Quarters | `q`, `qtr`, `qtrs`, `quarter`, `quarters` |
| Years | `y`, `yr`, `yrs`, `year`, `years` |

The snap-to time supports all the time units above, as well as the following day of the week time units:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The snap-to time supports all the time units above, as well as the following day of the week time units:
The snap-to time supports all the time units above, as well as the following day-of-the-week time units:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅


| Time Unit | Supported Keywords |
|-----------|--------------------|
| Sunday | `w0`, `w7` |
| Monday | `w1` |
| Tuesday | `w2` |
| Wednesday | `w3` |
| Thursday | `w4` |
| Friday | `w5` |
| Saturday | `w6` |

The special relative time string `now` for the current timestamp is also supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be mentioned higher up (like on the first line of the usage/description).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that now is calculated once at the start of the query execution, and used for relative datetime calculations for that query.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... can you verify the above comment? If you print out now() twice, does it give exactly the same result (down to the millisecond)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... can you verify the above comment? If you print out now() twice, does it give exactly the same result (down to the millisecond)?

Yes, down to the millisecond:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the mention of now closer to the start, and specified that the current timestamp is the same for all relative timestamps in the same query. Let me know what you think.


For example, if the current timestamp is Monday, January 03, 2000 at 01:01:01 am:

| Relative String | Description | Resulting Relative Time |
|-----------------|--------------------------------------------------------------|---------------------------------------------|
| `-60m` | Sixty minutes ago | Monday, January 03, 2000 at 00:01:01 am |
| `-1h` | One hour ago | Monday, January 03, 2000 at 00:01:01 am |
| `+2wk` | Two weeks from now | Monday, January 17, 2000 at 00:01:01 am |
| `-1h@w3` | One hour ago, rounded to the start of the previous Wednesday | Wednesday, December 29, 1999 at 00:00:00 am |
| `@d` | Start of the current day | Monday, January 03, 2000 at 00:00:00 am |
| `now` | Now | Monday, January 03, 2000 at 01:01:01 am |

Argument type: STRING

Return type: TIMESTAMP

Example:

os> source=people | eval seconds_diff = timestampdiff(SECOND, now(), relative_timestamp("now")) | fields seconds_diff | head 1
fetched rows / total rows = 1/1
+--------------+
| seconds_diff |
|--------------+
| 0 |
+--------------+

os> source=people | eval hours_diff = timestampdiff(HOUR, now(), relative_timestamp("+1h")) | fields hours_diff | head 1
fetched rows / total rows = 1/1
+------------+
| hours_diff |
|------------+
| 1 |
+------------+

os> source=people | eval day = day_of_week(relative_timestamp("@w0")) | fields day | head 1
fetched rows / total rows = 1/1
+-----+
| day |
|-----|
| 1 |
+-----+

### `SECOND`

Expand Down
1 change: 1 addition & 0 deletions docs/ppl-lang/ppl-where-command.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,4 @@ PPL query:
| eval factor = case(a > 15, a - 14, isnull(b), a - 7, a < 3, a + 1 else 1)
| where case(factor = 2, 'even', factor = 4, 'even', factor = 6, 'even', factor = 8, 'even' else 'odd') = 'even'
| stats count() by factor`
- `source = table | where timestamp >= relative_timestamp("-1d@w6")`
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,58 @@ class FlintSparkPPLBuiltInDateTimeFunctionITSuite
assertSameRows(Seq(Row(3)), frame)
}

test("test RELATIVE_TIMESTAMP") {
var frame = sql(s"""
| source = $testTable
| | eval seconds_diff = timestampdiff(SECOND, now(), relative_timestamp("now"))
| | fields seconds_diff
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(0)), frame)

frame = sql(s"""
| source = $testTable
| | eval hours_diff = timestampdiff(HOUR, now(), relative_timestamp("+1h"))
| | fields hours_diff
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(1)), frame)

frame = sql(s"""
| source = $testTable
| | eval day = day_of_week(relative_timestamp("@w0"))
| | fields day
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(1)), frame)
}

// TODO #957: Support earliest
ignore("test EARLIEST") {
var frame = sql(s"""
| source = $testTable
| | eval earliest_hour_before = earliest(now(), "-1h")
| | eval earliest_now = earliest(now(), "now")
| | eval earliest_hour_after = earliest(now(), "+1h")
| | fields earliest_hour_before, earliest_now, earliest_hour_after
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(true), Row(true), Row(false)), frame)
}

// TODO #957: Support latest
ignore("test LATEST") {
var frame = sql(s"""
| source = $testTable
| | eval latest_hour_before = latest(now(), "-1h")
| | eval latest_now = latest(now(), "now")
| | eval latest_hour_after = latest(now(), "+1h")
| | fields latest_hour_before, latest_now, latest_hour_after
| | head 1
| """.stripMargin)
assertSameRows(Seq(Row(false), Row(true), Row(true)), frame)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acarbonetto Any idea on better tests for earliest and latest that don't require mocking the current time for testing? Ultimately, earliest and latest are really only wrappers around relative_timestamp function (earliest(field_name, "-1h@d") is equivalent to field_name >= relative_timestamp("-1h@d"), so it seems fine to me as long as we have pretty robust unit tests for relative timestamp. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are cheap - so it's okay to have duplicate tests.

Since IT tests are mostly focusing on testing the API and integration, I don't thinks that's overly valuable to test the backend logic. Leave that to Unit Tests where mocking is easily done.

If you need to mock the IT test backend, you're probably not doing testing correctly.


test("test CURRENT_TIME is not supported") {
val ex = intercept[UnsupportedOperationException](sql(s"""
| source = $testTable
Expand Down
4 changes: 2 additions & 2 deletions ppl-spark-integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ translation between PPL's logical plan to Spark's Catalyst logical plan.
### Context
The next concepts are the main purpose of introduction this functionality:
- Transforming PPL to become OpenSearch default query language (specifically for logs/traces/metrics signals)
- Promoting PPL as a viable candidate for the proposed CNCF Observability universal query language.
- Promoting PPL as a viable candidate for the proposed CNCF Observability universal query language.
- Seamlessly Interact with different datasources such as S3 / Prometheus / data-lake leveraging spark execution.
- Using spark's federative capabilities as a general purpose query engine to facilitate complex queries including joins
- Improve and promote PPL to become extensible and general purpose query language to be adopted by the community
Expand Down Expand Up @@ -37,7 +37,7 @@ In Apache Spark, the DataFrame API serves as a programmatic interface for data m

For instance, if you have a PPL query and a translator, you can convert it into DataFrame operations to generate an optimized execution plan. Spark's underlying Catalyst optimizer will convert these DataFrame transformations and actions into an optimized physical plan executed over RDDs or Datasets.

The following section describes the two main options for translating the PPL query (using the logical plan) into the spark corespondent component (either dataframe API or spark logical plan)
The following section describes the two main options for translating the PPL query (using the logical plan) into the spark correspondent component (either dataframe API or spark logical plan)


### Translation Process
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ MONTHNAME: 'MONTHNAME';
NOW: 'NOW';
PERIOD_ADD: 'PERIOD_ADD';
PERIOD_DIFF: 'PERIOD_DIFF';
RELATIVE_TIMESTAMP: 'RELATIVE_TIMESTAMP';
SEC_TO_TIME: 'SEC_TO_TIME';
STR_TO_DATE: 'STR_TO_DATE';
SUBDATE: 'SUBDATE';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -747,6 +747,7 @@ dateTimeFunctionName
| NOW
| PERIOD_ADD
| PERIOD_DIFF
| RELATIVE_TIMESTAMP
| QUARTER
| SECOND
| SECOND_OF_MINUTE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import lombok.RequiredArgsConstructor;
import org.opensearch.sql.data.type.ExprCoreType;

/** The DataType defintion in AST. Question, could we use {@link ExprCoreType} directly in AST? */
/** The DataType definition in AST. Question, could we use {@link ExprCoreType} directly in AST? */

@RequiredArgsConstructor
public enum DataType {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,9 @@ public enum BuiltinFunctionName {
LOCALTIMESTAMP(FunctionName.of("localtimestamp")),
SYSDATE(FunctionName.of("sysdate")),

// Relative timestamp functions
RELATIVE_TIMESTAMP(FunctionName.of("relative_timestamp")),

/** Text Functions. */
TOSTRING(FunctionName.of("tostring")),

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,12 @@
import scala.collection.JavaConverters;
import scala.collection.mutable.WrappedArray;

import java.lang.Boolean;
import java.math.BigInteger;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.sql.Timestamp;
import java.time.LocalDateTime;
import java.util.Collection;
import java.util.List;
import java.util.Map;
Expand All @@ -35,6 +38,9 @@

public interface SerializableUdf {

abstract class SerializableAbstractFunction1<T1,R> extends AbstractFunction1<T1,R>
implements Serializable {
}

abstract class SerializableAbstractFunction2<T1, T2, R> extends AbstractFunction2<T1, T2, R>
implements Serializable {
Expand Down Expand Up @@ -109,7 +115,7 @@ public String apply(String jsonStr, WrappedArray<String> elements) {
}
}
};

Function2<String, String, Boolean> cidrFunction = new SerializableAbstractFunction2<>() {

IPAddressStringParameters valOptions = new IPAddressStringParameters.Builder()
Expand Down Expand Up @@ -197,9 +203,18 @@ public BigInteger apply(String ipAddress) {
};
}

abstract class SerializableAbstractFunction1<T1,R> extends AbstractFunction1<T1,R>
implements Serializable {
}
/**
* Returns the {@link Timestamp} corresponding to the given relative string, current timestamp, and time zone identifier.
* Throws {@link RuntimeException} if the relative timestamp string is not supported.
*/
Function2<String, Timestamp, Timestamp> relativeTimestampFunction = new SerializableAbstractFunction2<String, Timestamp, Timestamp>() {
@Override
public Timestamp apply(String relativeDateTimeString, Timestamp currentTimestamp) {
LocalDateTime currentLocalDateTime = currentTimestamp.toLocalDateTime();
LocalDateTime relativeLocalDateTime = TimeUtils.getRelativeLocalDateTime(relativeDateTimeString, currentLocalDateTime);
return Timestamp.valueOf(relativeLocalDateTime);
}
};

/**
* Get the function reference according to its name
Expand Down Expand Up @@ -254,6 +269,15 @@ static ScalaUDF visit(String funcName, List<Expression> expressions) {
Option.apply("ip_to_int"),
false,
true);
case "relative_timestamp":
return new ScalaUDF(relativeTimestampFunction,
DataTypes.TimestampType,
seq(expressions),
seq(),
Option.empty(),
Option.apply("relative_timestamp"),
false,
true);
default:
return null;
}
Expand Down
Loading
Loading