Is there a rule of thumb for NUM_BOOTSTRAP? #13

echatzikyriakidis · 2023-03-28T19:11:02Z

In my experiments I have the default value (500) for the bootstrap rounds when estimating the sensitivity threshold. I see in the implementation that this process is very CPU-bound and utilizes multicore if possible.

In my environment I have 8 CPU cores and usually on large tables it takes 1-2 hours to complete before training starts. All this time the GPU in my runtime environment is idle waiting the sensitivity threshold estimation to complete. (Also, in Colab sometimes it disconnects the runtime because it notices that the runtime uses mainly CPU).

I know that by setting this to a smaller value it will run faster but I wonder if there is a rule of thumb or it is just a matter of try-and-error. I understand that it is important to estimate correctly this threshold as it will be used for early stopping the training.

Thanks!

avsolatorio · 2023-04-28T18:54:04Z

Hello @echatzikyriakidis , 100 can be a reasonable trade-off. A higher value of the bootstrap round helps in producing a stable threshold. So you will have to take note of this.

One potential solution is to allow for precomputation of the sensitivity threshold outside the fit function. When fitting with the data, one can specify a file containing the pre-computed value. It must, however, first check if the parameters used in the pre-computation are consistent with the parameters passed in the fit function.

With this implemented, you can perform the pre-computation on an instance without an accelerator, save it, then change the colab instance having a GPU.

If you're open to contributing to this feature, that would be very welcome! See: #16

echatzikyriakidis · 2023-04-28T21:09:27Z

Hi @avsolatorio!

I have managed to overpass the problem with the disconnects in Colab by buying the Colab Pro+ which never disconnects.

echatzikyriakidis closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a rule of thumb for NUM_BOOTSTRAP? #13

Is there a rule of thumb for NUM_BOOTSTRAP? #13

echatzikyriakidis commented Mar 28, 2023 •

edited

Loading

avsolatorio commented Apr 28, 2023 •

edited

Loading

echatzikyriakidis commented Apr 28, 2023

Is there a rule of thumb for NUM_BOOTSTRAP? #13

Is there a rule of thumb for NUM_BOOTSTRAP? #13

Comments

echatzikyriakidis commented Mar 28, 2023 • edited Loading

avsolatorio commented Apr 28, 2023 • edited Loading

echatzikyriakidis commented Apr 28, 2023

echatzikyriakidis commented Mar 28, 2023 •

edited

Loading

avsolatorio commented Apr 28, 2023 •

edited

Loading