Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions ahead of time #184

Open
Peter9192 opened this issue Sep 18, 2023 · 4 comments
Open

Predictions ahead of time #184

Peter9192 opened this issue Sep 18, 2023 · 4 comments

Comments

@Peter9192
Copy link
Collaborator

Currently we're predicting "the day of year that event X happened", given all the data (temperature, satellite, etc.) covering the entire growing season. This means you can only predict retrospectively.

Ideally, we'd also like to make forecasts, i.e. predictions ahead of time, as well. So at some point during the growing season, we want to predict, given the latest information available, when the event might occur (or perhaps it has occurred already).

To achieve this, we would like to make the following two changes to the training data:

  1. Predicting "Days until/since event happened" instead of predicting the absolute values of the Day Of Year of the event.
  2. Duplicate the training data several times, every time subtracting a different offset from the DOY. Then, for each offset, extract meaningful features that can be valid at any time during the growing season. For example, cumulative temperature since start of growing season, temperature during the past 10, 20 and/or 30 days, number of days temperature exceeded 15 degrees, etc.
@Peter9192
Copy link
Collaborator Author

@mkhzadeh it would be helpful to link to your example notebook here

@Peter9192
Copy link
Collaborator Author

The transposed columns are a bit in the way of implementing this neatly. Perhaps we could

  • Use xarray for multidimensional data until the very last step when we convert it to dataframe
  • Use daft.io to store timeseries in a single cell in dataframes
  • Use a multiindex??

@Peter9192
Copy link
Collaborator Author

I created an example notebook to illustrate the main idea/steps in #185
@sverhoeven I think this could be relevant for the finalization/testing of the data load/transform/combine methods, we could discuss it some time.

@khzadeh
Copy link
Contributor

khzadeh commented Sep 22, 2023

Please consider "Duplication of the training data" on hold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants