Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate recursive forecasts in chunks #197

Merged
merged 16 commits into from
Sep 11, 2022

Conversation

joranE
Copy link
Contributor

@joranE joranE commented Aug 13, 2022

Looks at new_data for the number of complete rows to determine the chunk size, then generates the predicted values in chunks. It leads to some significant performance improvements in cases where the lags used aren't necessarily consecutive (i.e. if you have lags 5, 10 & 15 in your recipe, this will generate predictions in chunks of 5).

@joranE
Copy link
Contributor Author

joranE commented Aug 14, 2022

Two other much more minor ideas for potential performance improvements are:

  1. Move creation of .temp_new_data outside of the for loop, updating the y_var column of .temp_new_data with new predictions as you go, instead of modifying new_data and recalling bind_rows() each time. This is a very easy change but probably only a very small difference in performance except for rare cases.
  2. Modify .transform() so that it operates on a sliding window of .temp_new_data of constant size each iteration, instead of recalculating all the lags for the entire data set each time. This would require changes upstream in the creation of .transform in object$spec so that it always lops off slice_tail(chunk_size) instead of slice_tail(n = as.integer(new_data_size) %>% .[slice_idx,]. This is obviously a somewhat bigger change, but might have more significant performance benefits in cases with many lagged variables.

Edit: I updated the pull request to incorporate these performance improvements as well.

joranE added 2 commits August 16, 2022 19:29
…modify .transform so that it it only ever called on the minimum necessary number of rows of .temp_new_data
@mdancho84
Copy link
Contributor

I'm running the checks on this... I want to see if anything breaks.

@joranE
Copy link
Contributor Author

joranE commented Aug 17, 2022

Well that didn't go well. :(

Looks like the modeltime_refit() tests all break because it's being fed new_data that has already had all the lagged features constructed, so it has no missing values. I'm happy to keep trying to get this to work, unless you think it's not going to be possible.

@mdancho84
Copy link
Contributor

Hmmm, I'm not sure. It's tough for me to tell. But you can certainly keep trying. I appreciate your work.

@joranE
Copy link
Contributor Author

joranE commented Aug 18, 2022

So the only way I could get it to work was to add an explicit chunk_size argument to recursive(). There's just not enough requirements on new_data to be able to reliably infer a sensible chunk size from the data. It defaults to 1, of course, so by default it will just generate the predictions one row at a time.

I added tests for the case when chunk_size > 1 in test-recursive-chunk.R. This version passes all tests in test-recursive.R and test-recursive-chunk.R when I run them locally. I can't get any of the tests to run via devtools::test() at all. Nothing gets past the .on_load() hook in zzz.R and they still don't run if I comment that out.

Anyway, this is as close to working as I'm probably going to be able to get it without some help or direction.

@mdancho84
Copy link
Contributor

Circling back... I'm running the tests.

The main thing I need to make sure is that there's backwards compatibility meaning that chunk_size = 1 should give the default behavior that is used currently.

Regarding testing locally, you'll need to run Build > Check

@joranE
Copy link
Contributor Author

joranE commented Aug 23, 2022

Ok, I'm still noodling on this, though. Specifically, I didn't realize there are some recursive methods in modeltime-refit.R that need to be updated.

@joranE
Copy link
Contributor Author

joranE commented Aug 26, 2022

I've got it to the point where devtools::check(vignettes = FALSE) runs with only two notes, one about the number of dependencies and one "no visible binding for global variable". All the vignettes build except for the spark one, which I can't get running on my machine due to some Java version incompatibility that I frankly don't have the patience to spend any time on. I've been using this branch on my own models for a few days now and it seems fine, but I want to write some more careful tests just to be sure.

@mdancho84
Copy link
Contributor

OK, the spark tests I can get running. Java is a pain.

I agree with writing a couple of tests just to see what happens.

The global variables are an easy fix. You can just add them to the global_variables.R file and that should go away. Or I can add them. Either way I'm not worried about that issue.

The main thing is making sure my code doesn't break and that you have your additions tested to ensure they are properly working.

@joranE
Copy link
Contributor Author

joranE commented Sep 8, 2022

I caught a few additional edge cases and added tests for them (when the number of rows in new_data is unusually small), and everything seems to be passing, except spark stuff on the mac-OS build. It seems fine to me so far (I've been using the branch myself) but if you have ideas of other edge cases you'd like tests for, let me know and I can work on writing them.

@mdancho84 mdancho84 merged commit 7e0201c into business-science:master Sep 11, 2022
@mdancho84
Copy link
Contributor

I've merged this PR. Tests are passing. And performance improvement should be very high in certain scenarios.

@mdancho84
Copy link
Contributor

Hi @joranE ,

I'm running into an issue with modeltime.ensemble here: business-science/modeltime.ensemble#22

The challenge is to upgrade the recursive forecast in chunks for ensembles.

I'd like your help if possible.

@mdancho84
Copy link
Contributor

I think I have this solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants