You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As it was noticed during the last benchmark tests run, the treatment of unbalanced datasets is suboptimal when running with multiprocessing enabled if one of the RDataFrames is built on top of a dataset whose size is much bigger than the others, the worker that process it end up creating a bottleneck for the entire analysis. Several ways (to be investigated and implemented separately) can fix this issue:
combine the usage of multiprocessing and multithreading: detect in advance the larger datasets and split the workers that get to process these into multiple threads; in order not to increase the number of cores used, the overall number of workers decreases;
using only multiprocessing: detect in advance the larger datasets and split them into different RDataFrames, so that they are taken by different workers; the results can be easily merged at the end to get the proper histograms; this solution also requires something to check that the largest RDataFrames are the first ones sent to the workers.
The text was updated successfully, but these errors were encountered:
As it was noticed during the last benchmark tests run, the treatment of unbalanced datasets is suboptimal when running with multiprocessing enabled if one of the RDataFrames is built on top of a dataset whose size is much bigger than the others, the worker that process it end up creating a bottleneck for the entire analysis. Several ways (to be investigated and implemented separately) can fix this issue:
The text was updated successfully, but these errors were encountered: