-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_layered: More optimal core allocation #1109
base: main
Are you sure you want to change the base?
Conversation
this is cool. it might be cool if this were p-core e-core aware, but idk how that could work well without being aware of if the user preferred energy efficiency or performance (maybe via a flag, idk)? |
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Here's an updated approach, some implementation details may keep changing until I've tested this with a real-workload properly. I will remove the draft status once I've tested with some real workload and verified different for edge cases properly and then it should be good to go. ApproachThe idea is to assign entire LLCs to layers at once, therefore allocation happens at LLC granularity. There is an idea of heavy "sticky" layers and light "low" layers, which is based on utilization. Sticky layers can forcibly reclaim/reassign LLCs from low layers. LLCs used for sticky layers are also not visible in the "free" LLC pool for allocation. For now I was hacking into the code to set sticky by matching on the name, but it should either be done automatically (some threshold layer size) or through the config by indicating main workload vs misc stuff through it. CompactionCompaction is driven by layer utilization. Low/light layers are merged into the same or fewer LLCs based on utilization target (harcoded to 20% but will change to something configurable). HysteresisIf we see that a layer goes above or below continually for 2 (arbitrarily chosen for now) intervals of step function, only then do we grow or shrink a layer. This avoids flipping the algorithm of reallocating cores on sudden spikes or boundary conditions. The utilization range is hardcoded for testing for now but it can be changed to something configurable. TODOs:
|
for (i, layer) in self.layers.iter().enumerate() { | ||
let owned = self.sched_stats.layer_utils[i][LAYER_USAGE_OWNED]; | ||
let open = self.sched_stats.layer_utils[i][LAYER_USAGE_OPEN]; | ||
let total_util = owned + open; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When determining target number of CPUs, open consumptions are considered iff the layer has no CPUs because otherwise grouped layers end up overallocating. Also, I wonder whether it'd make more sense to determine the number of LLCs to allocate in terms of the the result of target CPU calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the allocation needs to be fair within each LLC too for intel and other CPUs with one or few LLCs.
wants | ||
} | ||
|
||
fn weighted_target_llcs(&self, raw_wants: &[usize]) -> Vec<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, wouldn't it make more sense to determine this according to the number of CPUs allocated to each layer?
let assigned_count = layer.llcs_assigned.len(); | ||
if layer.is_sticky && assigned_count > 0 { | ||
// remove from free_llcs | ||
for &llc_id in layer.llcs_assigned.keys() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These operations may be easier with HashSet
or BTreeSet
.
This PR documents my initial attempt at doing more optimal layer core order generation, and invites others to provide ideas.
I am not continuing fixing the current attempt (there's a few odd order generations with more LLCs), but the idea is to space out layers as much as possible to minimize overlaps. For this we use a greedy approach to finding segments with maximum run of unallocated LLCs and try to place a new layer there. Then grow it in either direction before running out of cores.
For a machine with two LLCs, 0-19, and 20-39.
Old:
New:
We can be more intelligent here, like growing out in our own NUMA domain's LLCs first before spreading outside that, but all of this is futile because this algorithm won't be globally optimal. It assumes all layers have equal sizes and load, which is not true. Using a "weight" to push layers left or right when allocating LLCs works, but cannot adapt to changing layer sizes at runtime, which is the prevalent case. Thus, there will be overrun eventually even with this algorithm.
Instead, after talking to Tejun, I will update this to estimate the layer size at runtime, and regenerate the core order by picking free LLCs for each layer starting with those with the greatest size (thus precedence), and attempt to pack the rest if necessary.