-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_utils: Add support for AMD Pref Core #1207
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM as a starting point.
I don't know much about AMD prefcore, but it'd be nice to have also some logic to classify preferred cores vs regular cores, eventually providing a method to get the list of these cores, so that schedulers can easily create separate scheduling domains for example, or something similar.
rust/scx_utils/src/topology.rs
Outdated
/// Currently used by AMD CPUs to show which cores to use vs others. | ||
/// The higher the number then the higher the priority the core. | ||
/// This is set at boot time but can change at runtime via tunables. | ||
pub pref_rank: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can change at runtime (via sysfs I guess) would it make sense to provide also a method to refresh this value? This could be used by a scheduler while it's running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll have a look at adding that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often can this change? How often does it practically change? Should it be read on each access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my current usecase it gets changed when a game is launched to rank one CCD higher than the other then changed back to rank the other CCD higher when the game is closed. But unless you have something that does that or the user does it themselves it won't change at all after boot.
I'm really not sure what the best way of managing this is, so open to any ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you on a CPU with 3D V cache on one CCD and not on the other? It looks like the updates are triggered from ACPI device notification on the processor but it doesn't seem to generate any notifications to the userspace. I wonder how ACPI knows that a game has launched.
Hmm... The fair sched is using to modify load balancing decisions, basically to direct tasks towards higher score CPUs. We can definitely use it in a similar way and at least in the way that fair uses it, it seems to be treated in a similar way as how big/little cores are handled.
I'm not sure how the interface should be but:
- We don't get notifications from the kernel on whether the values have changed or not (this we can improve by updating
amd_pstate_update_limits()
to generate sysfs file modified events later), but I don't like the idea of letting each scheduler to make the decisions on when to read, how frequently and so on. It doesn't look like the ranking is going to change at high frequency (that wouldn't be useful anyway), so the topology crate can cache the values and when the values are needed, read them if the last read values are too old (e.g. over a second or whatever). - Making the raw values accessible may be useful but the values are unitless and all that they provide seems to be ranking, so let's provide it as ordering - ie. provide sorted list and comparison interface (is cpu A better than cpu B).
- Given that the ordering can change over time, it'd be also useful for topology to detect since when the current ordering has been valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @superm1
Could you mind clarify, how this is managed in 6.14+, due the recent changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the CPU has 3D vcache on one CCD and not the other. There is no acpi mechanism to know when a game is launched. I've written a userspace daemon to do that then adjust the cache vs frequency value for amd_x3d_mode in /sys/bus/platform/drivers/amd_x3d_vcache/ which then in turn adjusts the pref core values in sysfs.
The usage should be very similar to big/little yeah. For reference this is useful outside of the 3D vcache usecase as it is also a general ordering of core quality and boostablilty of cores on AMD. Generally the better cores can boost higher so things like a heavy single threaded load should go to the highest ranked core etc.
So based on your points 2 and 3 a better approach might be to instead of having a value on each cpu in the topo instead have a getter that returns a sorted list of cpu ids that caches internally for x seconds (I think this can be reasonably long, 10s or so) with a helper function to check if one cpu is ranked higher than another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the CPU has 3D vcache on one CCD and not the other. There is no acpi mechanism to know when a game is launched. I've written a userspace daemon to do that then adjust the cache vs frequency value for amd_x3d_mode in /sys/bus/platform/drivers/amd_x3d_vcache/ which then in turn adjusts the pref core values in sysfs.
Ah, I see. Yeah, writing to that file triggers an ACPI action which probably generates the notification on the processor object.
The usage should be very similar to big/little yeah. For reference this is useful outside of the 3D vcache usecase as it is also a general ordering of core quality and boostablilty of cores on AMD. Generally the better cores can boost higher so things like a heavy single threaded load should go to the highest ranked core etc.
So based on your points 2 and 3 a better approach might be to instead of having a value on each cpu in the topo instead have a getter that returns a sorted list of cpu ids that caches internally for x seconds (I think this can be reasonably long, 10s or so) with a helper function to check if one cpu is ranked higher than another?
Yeah, something along that line. Mutability can be a bit tricky and hopefully this can be hidden behind an internal mutex or atomic variables so that the whole topology objects doesn't have to be made mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htejun I have updated based on your feedback, please let me know if this matches what we discussed. Also if there is a better way of doing the mutability let me know I am no rust expert (yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yeah, writing to that file triggers an ACPI action which probably generates the notification on the processor object.
Yes; when writing to that sysfs file it will use an ACPI device to communicate with the platform, and the platform will re-evaluate all rankings. When the new rankings are ready it will emit a CPPC Highest Performance changed
notification (0x85). The amd-pstate driver will respond to this event and will re-read all the rankings. Those rankings will be fed into ITMT.
There also is a netlink event that occurs that I feel userspace could listen to if it's interesting to react directly.
https://github.com/torvalds/linux/blob/v6.13/drivers/acpi/processor_driver.c#L86
So say we a higher up bool to say is pref core is active globally rather than relying on the sched checking for none 0 values on each core? Currently I don't think it is possible to have a mixed system so it is either globally on and all cores are ranked or it is off. |
@arighi I've pushed a commit with a fn to get if pref ranks are enabled globally (via sysfs) as well as one to update the values for all cpus. |
Looks like a linter error otherwise LGTM. |
…us that caches it's contents for 10s, also add is_higher_ranked to directly compare two cpuids in the rankings. Remove the pref_rank from cpu struct in favour of the new mechanism.
545cead
to
e11e01c
Compare
@@ -307,6 +336,62 @@ impl Topology { | |||
} | |||
sibling_cpu | |||
} | |||
|
|||
/// Returns whether AMD preferred core ranking is enabled on this system | |||
pub fn has_pref_rank(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can change dynamically. This can be a static property.
|
||
/// Returns true if cpu_a has a higher rank than cpu_b. | ||
/// If ranking is not enabled or either CPU is invalid, returns false. | ||
pub fn is_higher_ranked(&self, cpu_a: usize, cpu_b: usize) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if you store the pref value in a AtomicU32
in each CPU, this becomes trivial and users can also read the raw value when they want to with the accessor function refreshing the numbers as necessary. Note that storing into an atomic doesn't require a mutable ref.
cache.last_updated = Instant::now(); | ||
} | ||
|
||
cache.cpu_ids.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can avoid copying the array each time using Arc::make_mut()
when updating and returning the Arc
clone.
It may also be useful to record and report a generation number together so that the caller can determine whether the order has changed since the last read (e.g. so that the scheduler can skip certain operations if not).
/// Returns a sorted list of CPU IDs from highest to lowest rank. | ||
/// The list is cached internally and refreshed every 10 seconds. | ||
/// If preferred core ranking is not enabled, returns an empty slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than running every 10 seconds, how about using that netlink event I suggested?
https://github.com/torvalds/linux/blob/v6.13/drivers/acpi/processor_driver.c#L86
Adds AMD Pref Core support to topology.rs as per #1206