Skip to content

Commit

Permalink
update usage examples
Browse files Browse the repository at this point in the history
  • Loading branch information
patrick-zippenfenig committed Jun 15, 2021
1 parent 2010ec0 commit df6b06d
Showing 1 changed file with 80 additions and 6 deletions.
86 changes: 80 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,20 +80,94 @@ client = meteoblue_dataset_sdk.Client(apikey="xxxxxx")
result = await client.query(query)
```

You can use cache by passing a Cache object to the Client. There is a `caching.filecache.FileCache` class implemented
in in the meteoblue_dataset_sdk, but you can use your own implementation by using the abstract
class `caching.cache.AbstractCache`. You can customize the cache duration, the cache location
and the zlib compression level used to compressed the binary data.
## Caching results
If you are training a model and re-run your program multiple times, you can enable caching to store results from the meteoblue dataset SDK on disk. A simple file cache can be enabled with:

```python
import zlib
from meteoblue_dataset_sdk.caching import FileCache
cache = FileCache(path="/tmp/my_cache_dir", max_age=4000, compression_level=zlib.Z_BEST_SPEED)

# Cache results for 1 day (86400 seconds)
cache = FileCache(path="./mb_cache", max_age=86400, compression_level=zlib.Z_BEST_SPEED)
client = meteoblue_dataset_sdk.Client(apikey="xxxxxx", cache=cache)
```

If you want to implement a different cache (e.g. redis or S3), the SDK offers an abstract base class `caching.cache.AbstractCache`. The required methods are listed [here](./meteoblue_dataset_sdk/caching/abstractcache.py).


## Working with timestamps
Time intervals are encoded as a simple `start`, `end` and `stride` unix timestamps. With just a view lines of code, timestamps can be converted to an array of datetime objects:

```python
import datetime as dt

print(timeInterval)
# start: 1546300800
# end: 1546387200
# stride: 3600

timerange = range(timeInterval.start, timeInterval.end, timeInterval.stride)
timestamps = list(map(lambda t: dt.date.fromtimestamp(t), timerange))
```

This code works well for regular timesteps like hourly, 3-hourly or daily data. Monthly data is unfortunately not regular, and the API returns timestamps as an string array. The following code takes care of all cases and always returns an array of datetime objects:

```python
import datetime as dt
import dateutil.parser

def meteoblue_timeinterval_to_timestamps(t):
if len(t.timestrings) > 0:
def map_ts(time):
if "-" not in time:
return time
return dateutil.parser.parse(time.partition("-")[0])

return list(map(map_ts, t.timestrings))

timerange = range(t.start, t.end, t.stride)
return list(map(lambda t: dt.date.fromtimestamp(t), timerange))

query = { ... }
result = client.query_sync(query)
timestamps = meteoblue_timeinterval_to_timestamps(result.geometries[0].timeIntervals[0])
```

## Working with dataframes
To convert a result from the meteoblue dataset API to pandas dataframe, a few lines of code can help:

```python
import pandas as pd
import numpy as np

def meteoblue_result_to_dataframe(geometry):
t = geometry.timeIntervals[0]
timestamps = meteoblue_timeinterval_to_timestamps(t)

n_locations = len(geometry.lats)
n_timesteps = len(timestamps)

df = pd.DataFrame(
{
"TIMESTAMP": np.tile(timestamps, n_locations),
"Longitude": np.repeat(geometry.lons, n_timesteps),
"Latitude": np.repeat(geometry.lats, n_timesteps),
}
)

for code in geometry.codes:
name = str(code.code) + "_" + code.level + "_" + code.aggregation
df[name] = code.timeIntervals[0].data

return df

query = { ... }
result = client.query_sync(query)
df = meteoblue_result_to_dataframe(result.geometries[0])
```

## Protobuf format
Data is transferred using protobuf and defined as [this protobuf structure](./meteoblue_dataset_sdk/Dataset.proto).
In the background, data is transferred using protobuf and defined as [this protobuf structure](./meteoblue_dataset_sdk/Dataset.proto).

A 10 year hourly data series for 1 location requires `350 kb` using protobuf, compared to `1600 kb` using JSON. Additionally the meteoblue Python SDK transfers data using gzip which reduces the size to only `87 kb`.

Expand Down

0 comments on commit df6b06d

Please sign in to comment.