Skip to content

Commit

Permalink
Merge pull request #172 from awslabs/sql_sample_apps
Browse files Browse the repository at this point in the history
Sql sample apps
  • Loading branch information
sethusrinivasan authored Nov 9, 2023
2 parents 16476ad + 8ab0ae0 commit dce0241
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 8 deletions.
8 changes: 5 additions & 3 deletions sample_apps/sql/last_value_fill_forward/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@

## Query uses multiple steps

1. create time sequence (not limited to just 10,000 data points)
2. select raw data binned at same intervals
3. join time sequence with raw data as data set that contains NULL values now
1. create time sequence `time_seq_only` (not limited to just 10,000 data points)
2. get all distinct device ids (in this data example as gpio) as `distinct_gpio`
2. duplicate time sequence for device id's to `time_seq_with_gpio` (in this example gpio channels) to allow for each channel to be filled individually
2. select raw data binned at same intervals as `raw_pos`
3. join time sequence `time_seq_with_gpio` with raw data `raw_pos` as data set that contains NULL values now
4. use LAST_VALUE in filled dataset, this query lists justs 2 measures: orignal temperature that can contain NULL and the filled column

The result set shows both original value containing NULL and the filled value in a separate column
Expand Down
Original file line number Diff line number Diff line change
@@ -1,25 +1,32 @@
with time_seq as ( -- Timesequence is 30,240 data points, starting 2 weeks ago of total 3 weeks (1 week in future) with 1 min intervals
with time_seq_only as ( -- Timesequence is 30,240 data points, starting 2 weeks ago of total 3 weeks (1 week in future) with 1 min intervals
select
date_add('day', day,
date_add('hour', hour,
date_add('second', second, bin('2023-09-20 17:12:22.958000000',1m)))) as time
from unnest(sequence(0,3540,60)) t(second) cross join unnest (sequence(0, 23)) as t(hour) cross join unnest (sequence(0, 20)) as t(day)
order by day, hour, second
),
distinct_gpio as ( -- each device identified by gpio needs its own time sequence, this query gets all device ids
select distinct(gpio) as gpio from "amazon-timestream-tools"."sensordata"
),
time_seq_with_gpio as ( -- multiple time sequences, one for each device identified by gpio
select time, gpio from time_seq_only join distinct_gpio on true
),
raw_pos as (
SELECT bin(time, 1m) as p_time,
avg(temperature) as temperature,
gpio
FROM "amazon-timestream-tools"."sensordata" -- adjust if data is loaded to different table
where time between '2023-09-20 17:12:22.958000000' and now() -- sample data set contains data from 09/20/2023
and gpio = '22'
-- and gpio = '22'
GROUP BY gpio, bin(time, 1m)
),
-- dataset contains missing records as just symbol (key), timestamp and all other columns are null
dataset as (
select '22' as gpio, bin(time, 1m) as time, temperature from time_seq
select time_seq_with_gpio.gpio, bin(time, 1m) as time, temperature from time_seq_with_gpio
left join raw_pos
on time_seq.time = raw_pos.p_time
on time_seq_with_gpio.time = raw_pos.p_time
and time_seq_with_gpio.gpio = raw_pos.gpio
),
filled_set as (
SELECT
Expand All @@ -35,4 +42,4 @@ filled_set as (
)
select * from filled_set
-- select * from dataset -- use this line to review original data with containing gaps
order by time
order by gpio, time

0 comments on commit dce0241

Please sign in to comment.