Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_utils: Add support for AMD Pref Core #1207

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ferrreo
Copy link

@ferrreo ferrreo commented Jan 16, 2025

Adds AMD Pref Core support to topology.rs as per #1206

Copy link
Contributor

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM as a starting point.

I don't know much about AMD prefcore, but it'd be nice to have also some logic to classify preferred cores vs regular cores, eventually providing a method to get the list of these cores, so that schedulers can easily create separate scheduling domains for example, or something similar.

/// Currently used by AMD CPUs to show which cores to use vs others.
/// The higher the number then the higher the priority the core.
/// This is set at boot time but can change at runtime via tunables.
pub pref_rank: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can change at runtime (via sysfs I guess) would it make sense to provide also a method to refresh this value? This could be used by a scheduler while it's running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll have a look at adding that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often can this change? How often does it practically change? Should it be read on each access?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current usecase it gets changed when a game is launched to rank one CCD higher than the other then changed back to rank the other CCD higher when the game is closed. But unless you have something that does that or the user does it themselves it won't change at all after boot.

I'm really not sure what the best way of managing this is, so open to any ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you on a CPU with 3D V cache on one CCD and not on the other? It looks like the updates are triggered from ACPI device notification on the processor but it doesn't seem to generate any notifications to the userspace. I wonder how ACPI knows that a game has launched.

Hmm... The fair sched is using to modify load balancing decisions, basically to direct tasks towards higher score CPUs. We can definitely use it in a similar way and at least in the way that fair uses it, it seems to be treated in a similar way as how big/little cores are handled.

I'm not sure how the interface should be but:

  1. We don't get notifications from the kernel on whether the values have changed or not (this we can improve by updating amd_pstate_update_limits() to generate sysfs file modified events later), but I don't like the idea of letting each scheduler to make the decisions on when to read, how frequently and so on. It doesn't look like the ranking is going to change at high frequency (that wouldn't be useful anyway), so the topology crate can cache the values and when the values are needed, read them if the last read values are too old (e.g. over a second or whatever).
  2. Making the raw values accessible may be useful but the values are unitless and all that they provide seems to be ranking, so let's provide it as ordering - ie. provide sorted list and comparison interface (is cpu A better than cpu B).
  3. Given that the ordering can change over time, it'd be also useful for topology to detect since when the current ordering has been valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @superm1

Could you mind clarify, how this is managed in 6.14+, due the recent changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the CPU has 3D vcache on one CCD and not the other. There is no acpi mechanism to know when a game is launched. I've written a userspace daemon to do that then adjust the cache vs frequency value for amd_x3d_mode in /sys/bus/platform/drivers/amd_x3d_vcache/ which then in turn adjusts the pref core values in sysfs.

The usage should be very similar to big/little yeah. For reference this is useful outside of the 3D vcache usecase as it is also a general ordering of core quality and boostablilty of cores on AMD. Generally the better cores can boost higher so things like a heavy single threaded load should go to the highest ranked core etc.

So based on your points 2 and 3 a better approach might be to instead of having a value on each cpu in the topo instead have a getter that returns a sorted list of cpu ids that caches internally for x seconds (I think this can be reasonably long, 10s or so) with a helper function to check if one cpu is ranked higher than another?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the CPU has 3D vcache on one CCD and not the other. There is no acpi mechanism to know when a game is launched. I've written a userspace daemon to do that then adjust the cache vs frequency value for amd_x3d_mode in /sys/bus/platform/drivers/amd_x3d_vcache/ which then in turn adjusts the pref core values in sysfs.

Ah, I see. Yeah, writing to that file triggers an ACPI action which probably generates the notification on the processor object.

The usage should be very similar to big/little yeah. For reference this is useful outside of the 3D vcache usecase as it is also a general ordering of core quality and boostablilty of cores on AMD. Generally the better cores can boost higher so things like a heavy single threaded load should go to the highest ranked core etc.

So based on your points 2 and 3 a better approach might be to instead of having a value on each cpu in the topo instead have a getter that returns a sorted list of cpu ids that caches internally for x seconds (I think this can be reasonably long, 10s or so) with a helper function to check if one cpu is ranked higher than another?

Yeah, something along that line. Mutability can be a bit tricky and hopefully this can be hidden behind an internal mutex or atomic variables so that the whole topology objects doesn't have to be made mutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htejun I have updated based on your feedback, please let me know if this matches what we discussed. Also if there is a better way of doing the mutability let me know I am no rust expert (yet)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, writing to that file triggers an ACPI action which probably generates the notification on the processor object.

Yes; when writing to that sysfs file it will use an ACPI device to communicate with the platform, and the platform will re-evaluate all rankings. When the new rankings are ready it will emit a CPPC Highest Performance changed notification (0x85). The amd-pstate driver will respond to this event and will re-read all the rankings. Those rankings will be fed into ITMT.

There also is a netlink event that occurs that I feel userspace could listen to if it's interesting to react directly.
https://github.com/torvalds/linux/blob/v6.13/drivers/acpi/processor_driver.c#L86

@ferrreo
Copy link
Author

ferrreo commented Jan 16, 2025

Thanks, LGTM as a starting point.

I don't know much about AMD prefcore, but it'd be nice to have also some logic to classify preferred cores vs regular cores, eventually providing a method to get the list of these cores, so that schedulers can easily create separate scheduling domains for example, or something similar.

So say we a higher up bool to say is pref core is active globally rather than relying on the sched checking for none 0 values on each core? Currently I don't think it is possible to have a mixed system so it is either globally on and all cores are ranked or it is off.

@ferrreo
Copy link
Author

ferrreo commented Jan 16, 2025

@arighi I've pushed a commit with a fn to get if pref ranks are enabled globally (via sysfs) as well as one to update the values for all cpus.

@hodgesds
Copy link
Contributor

Looks like a linter error otherwise LGTM.

…us that caches it's contents for 10s, also add is_higher_ranked to directly compare two cpuids in the rankings.

Remove the pref_rank from cpu struct in favour of the new mechanism.
@ferrreo ferrreo force-pushed the scx_utils_amd_pref_core branch from 545cead to e11e01c Compare January 17, 2025 14:46
@@ -307,6 +336,62 @@ impl Topology {
}
sibling_cpu
}

/// Returns whether AMD preferred core ranking is enabled on this system
pub fn has_pref_rank(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can change dynamically. This can be a static property.


/// Returns true if cpu_a has a higher rank than cpu_b.
/// If ranking is not enabled or either CPU is invalid, returns false.
pub fn is_higher_ranked(&self, cpu_a: usize, cpu_b: usize) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if you store the pref value in a AtomicU32 in each CPU, this becomes trivial and users can also read the raw value when they want to with the accessor function refreshing the numbers as necessary. Note that storing into an atomic doesn't require a mutable ref.

cache.last_updated = Instant::now();
}

cache.cpu_ids.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can avoid copying the array each time using Arc::make_mut() when updating and returning the Arc clone.

It may also be useful to record and report a generation number together so that the caller can determine whether the order has changed since the last read (e.g. so that the scheduler can skip certain operations if not).

Comment on lines +351 to +353
/// Returns a sorted list of CPU IDs from highest to lowest rank.
/// The list is cached internally and refreshed every 10 seconds.
/// If preferred core ranking is not enabled, returns an empty slice.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than running every 10 seconds, how about using that netlink event I suggested?
https://github.com/torvalds/linux/blob/v6.13/drivers/acpi/processor_driver.c#L86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants