Generate recursive forecasts in chunks #197

joranE · 2022-08-13T15:54:08Z

Looks at new_data for the number of complete rows to determine the chunk size, then generates the predicted values in chunks. It leads to some significant performance improvements in cases where the lags used aren't necessarily consecutive (i.e. if you have lags 5, 10 & 15 in your recipe, this will generate predictions in chunks of 5).

…first set of complete data

joranE · 2022-08-14T03:33:26Z

Two other much more minor ideas for potential performance improvements are:

Move creation of .temp_new_data outside of the for loop, updating the y_var column of .temp_new_data with new predictions as you go, instead of modifying new_data and recalling bind_rows() each time. This is a very easy change but probably only a very small difference in performance except for rare cases.
Modify .transform() so that it operates on a sliding window of .temp_new_data of constant size each iteration, instead of recalculating all the lags for the entire data set each time. This would require changes upstream in the creation of .transform in object$spec so that it always lops off slice_tail(chunk_size) instead of slice_tail(n = as.integer(new_data_size) %>% .[slice_idx,]. This is obviously a somewhat bigger change, but might have more significant performance benefits in cases with many lagged variables.

Edit: I updated the pull request to incorporate these performance improvements as well.

…modify .transform so that it it only ever called on the minimum necessary number of rows of .temp_new_data

…elevant variables

mdancho84 · 2022-08-17T12:37:35Z

I'm running the checks on this... I want to see if anything breaks.

joranE · 2022-08-17T14:46:22Z

Well that didn't go well. :(

Looks like the modeltime_refit() tests all break because it's being fed new_data that has already had all the lagged features constructed, so it has no missing values. I'm happy to keep trying to get this to work, unless you think it's not going to be possible.

mdancho84 · 2022-08-17T14:51:51Z

Hmmm, I'm not sure. It's tough for me to tell. But you can certainly keep trying. I appreciate your work.

…e are too many ways in which data can be passed that makes infering it from new_data difficult

joranE · 2022-08-18T03:24:22Z

So the only way I could get it to work was to add an explicit chunk_size argument to recursive(). There's just not enough requirements on new_data to be able to reliably infer a sensible chunk size from the data. It defaults to 1, of course, so by default it will just generate the predictions one row at a time.

I added tests for the case when chunk_size > 1 in test-recursive-chunk.R. This version passes all tests in test-recursive.R and test-recursive-chunk.R when I run them locally. I can't get any of the tests to run via devtools::test() at all. Nothing gets past the .on_load() hook in zzz.R and they still don't run if I comment that out.

Anyway, this is as close to working as I'm probably going to be able to get it without some help or direction.

mdancho84 · 2022-08-23T11:55:45Z

Circling back... I'm running the tests.

The main thing I need to make sure is that there's backwards compatibility meaning that chunk_size = 1 should give the default behavior that is used currently.

Regarding testing locally, you'll need to run Build > Check

joranE · 2022-08-23T12:35:42Z

Ok, I'm still noodling on this, though. Specifically, I didn't realize there are some recursive methods in modeltime-refit.R that need to be updated.

…on-panel version

joranE · 2022-08-26T02:57:54Z

I've got it to the point where devtools::check(vignettes = FALSE) runs with only two notes, one about the number of dependencies and one "no visible binding for global variable". All the vignettes build except for the spark one, which I can't get running on my machine due to some Java version incompatibility that I frankly don't have the patience to spend any time on. I've been using this branch on my own models for a few days now and it seems fine, but I want to write some more careful tests just to be sure.

mdancho84 · 2022-08-26T11:39:36Z

OK, the spark tests I can get running. Java is a pain.

I agree with writing a couple of tests just to see what happens.

The global variables are an easy fix. You can just add them to the global_variables.R file and that should go away. Or I can add them. Either way I'm not worried about that issue.

The main thing is making sure my code doesn't break and that you have your additions tested to ensure they are properly working.

… in new_data either is not even or is unusually small

joranE · 2022-09-08T19:49:47Z

I caught a few additional edge cases and added tests for them (when the number of rows in new_data is unusually small), and everything seems to be passing, except spark stuff on the mac-OS build. It seems fine to me so far (I've been using the branch myself) but if you have ideas of other edge cases you'd like tests for, let me know and I can work on writing them.

mdancho84 · 2022-09-11T00:34:10Z

I've merged this PR. Tests are passing. And performance improvement should be very high in certain scenarios.

mdancho84 · 2022-10-17T16:16:41Z

Hi @joranE ,

I'm running into an issue with modeltime.ensemble here: business-science/modeltime.ensemble#22

The challenge is to upgrade the recursive forecast in chunks for ensembles.

I'd like your help if possible.

mdancho84 · 2022-10-17T17:42:15Z

I think I have this solved.

generate recursive forecasts in chunks determined by the size of the …

3dc603c

…first set of complete data

joranE added 2 commits August 16, 2022 19:29

Move the bind_rows() outside the for-loop so it is only called once; …

2a3b106

…modify .transform so that it it only ever called on the minimum necessary number of rows of .temp_new_data

fix the chunk_size calculation to only look for complete cases from r…

fa274fd

…elevant variables

joranE added 3 commits August 17, 2022 15:59

try skipping if only one chunk

7f499f5

need to explicitly set chunk_size when calling recursive() since ther…

120094a

…e are too many ways in which data can be passed that makes infering it from new_data difficult

tests for recursive models when chunk_size > 1

ee93c9a

joranE added 2 commits August 18, 2022 13:43

ensure sensible value for chunk_size

c2fbca8

fix for uneven trailing chunk sizes

26c07cc

joranE added 5 commits August 24, 2022 06:35

add chunk_size to recursive call in modeltime_refit

616c315

have save chunk_size up front otherwise it is overwritten

88c8b5d

for the panel case need to slice by the special rowid.. column values

86ec8ab

missed having to update the chunk size from the correct spot in the n…

efb64f5

…on-panel version

weird test code formatting

fc5370a

joranE added 3 commits August 27, 2022 15:09

handle degenerate number of rows in new_data

8fdb602

formatting

8c1319e

make sure chunked recursive forecasting works when the number of rows…

005ec3d

… in new_data either is not even or is unusually small

mdancho84 mentioned this pull request Sep 8, 2022

Feature Request: Improve Recursive Model Forecasting Speed #190

Open

mdancho84 merged commit 7e0201c into business-science:master Sep 11, 2022

mdancho84 mentioned this pull request Nov 22, 2022

Speed of recursive forecasting has gotten worse #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate recursive forecasts in chunks #197

Generate recursive forecasts in chunks #197

joranE commented Aug 13, 2022

joranE commented Aug 14, 2022 •

edited

Loading

mdancho84 commented Aug 17, 2022

joranE commented Aug 17, 2022

mdancho84 commented Aug 17, 2022

joranE commented Aug 18, 2022

mdancho84 commented Aug 23, 2022

joranE commented Aug 23, 2022

joranE commented Aug 26, 2022

mdancho84 commented Aug 26, 2022

joranE commented Sep 8, 2022

mdancho84 commented Sep 11, 2022

mdancho84 commented Oct 17, 2022

mdancho84 commented Oct 17, 2022

Generate recursive forecasts in chunks #197

Generate recursive forecasts in chunks #197

Conversation

joranE commented Aug 13, 2022

joranE commented Aug 14, 2022 • edited Loading

mdancho84 commented Aug 17, 2022

joranE commented Aug 17, 2022

mdancho84 commented Aug 17, 2022

joranE commented Aug 18, 2022

mdancho84 commented Aug 23, 2022

joranE commented Aug 23, 2022

joranE commented Aug 26, 2022

mdancho84 commented Aug 26, 2022

joranE commented Sep 8, 2022

mdancho84 commented Sep 11, 2022

mdancho84 commented Oct 17, 2022

mdancho84 commented Oct 17, 2022

joranE commented Aug 14, 2022 •

edited

Loading