Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MangoHud Intel GPU metrics #1082

Closed
flightlessmango opened this issue Jul 25, 2023 · 52 comments · Fixed by #1499
Closed

MangoHud Intel GPU metrics #1082

flightlessmango opened this issue Jul 25, 2023 · 52 comments · Fixed by #1499

Comments

@flightlessmango
Copy link
Owner

This issue is for tracking the progress and issues with Intel GPU metrics. dGPU and iGPU.

Current state

  • intel_gpu_top has to be installed and it has to be setcap'd e.g sudo setcap cap_perfmon=+ep /usr/bin/intel_gpu_top
  • We are set up to get GPU load at least from intel_gpu_top
  • MangoHud is only able to run intel_gpu_top unhindered outside of proton/runtime
  • In order for us to run intel_gpu_top you need to have flatpak installed and set sniper runtime to beta
@Cris-lml007
Copy link

this: sudo setcap cap_perfmon=+ep /usr/bin/intel_gpu_top, it did not work for me, I saw that others do work, because it can be?

@flightlessmango
Copy link
Owner Author

What mangohud version are you using?

@Cris-lml007
Copy link

version 0.7.0

@nokia8801
Copy link
Contributor

It does not work for me either with sudo setcap cap_perfmon=+ep /usr/bin/intel_gpu_top.

getcap /usr/bin/intel_gpu_top shows it is applied correctly.
/usr/bin/intel_gpu_top cap_perfmon=ep

Failed to initialize PMU! (Permission denied)

When running as a normal user CAP_PERFMON is required to access performance
monitoring. See "man 7 capabilities", "man 8 setcap", or contact your
distribution vendor for assistance.

More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[2024-01-22 15:52:48.046] [MANGOHUD] [info] [intel.cpp:74] Missing permissions for 'intel_gpu_top'
[2024-01-22 15:52:48.046] [MANGOHUD] [info] [intel.cpp:76] Disabling gpu_stats

MangoHud 0.7.0-2+
Arch Linux
GNOME 45.3 on Wayland
Linux 6.7.0-arch3-1

@tazz4843
Copy link

tazz4843 commented Feb 8, 2024

MangoHud is only able to run intel_gpu_top unhindered outside of proton/runtime
In order for us to run intel_gpu_top you need to have flatpak installed and set sniper runtime to beta

These two points seem to be fixed, I have everything working in Proton 8.0-5, v0.7.0-2+ with no hitches. GPU load and frequency function without issues.

Only major remaining thing that would be nice is GPU power draw.

@kira-bruneau
Copy link
Contributor

I was just wondering, is there a specific reason why intel gpu metrics are collected with intel_gpu_top? It looks like it should be possible to obtain the same stats through the drm & hwmon kernel interfaces, like what's done in resources: https://github.com/nokyan/resources/blob/v1.4.0/src/utils/gpu/intel.rs#L38.

That way you wouldn't need to rely on intel_gpu_top being installed and having it setcap'd.

@flightlessmango
Copy link
Owner Author

is there a specific reason why intel gpu metrics are collected with intel_gpu_top

Most of these metrics required root for access and as a shared library we're not able to access root things without a middle man program

@jcsstelar
Copy link

Hi hi, after upgrading to 0.7.2 from 0.7.1 (flatpak) my GPU % use is always 0. I have an iGPU that is HD610. Im with ubuntu 22.04.4 and I use MangoHud with retroarch, lutris and also steam, not sure if I can help, Im new to linux. Love your work <3

@zeptic99
Copy link

Hi hi, after upgrading to 0.7.2 from 0.7.1 (flatpak) my GPU % use is always 0. I have an iGPU that is HD610. Im with ubuntu 22.04.4 and I use MangoHud with retroarch, lutris and also steam, not sure if I can help, Im new to linux. Love your work <3

Same, using Intel Arc a380.

@KF-Art
Copy link

KF-Art commented Sep 28, 2024

Setcapping intel_gpu_top made it show the usage percentage, but it only works if I'm running a OpenGL game. If I use Vulkan, it will remain at 0%. This happens as well if the game is running with the card that was not used to start Xorg. For example, If I'm using my dGPU to start Xorg and I start a game using the iGPU (both Intel, of course), it will remain at 0%. Maybe here the -d flag is needed?

I'm using Intel Iris Xe as iGPU and Intel Iris Xe MAX as dGPU (both working with i915 driver).
OS: Void Linux x86_64.
Kernel: 6.11.0_1
Mesa version: 24.1.5_1
Flatpak is not being used. This behavior also happens with Wine, emulators like Dolphin or native games like Slime Rancher Demo (running through Steam).

@eclairevoyant
Copy link

pretty sure igt doesn't support xe yet, unless I've missed some recent release or such.

@KF-Art
Copy link

KF-Art commented Sep 29, 2024

pretty sure igt doesn't support xe yet, unless I've missed some recent release or such.

Well, at least in my case it seems to work correctly. I don't know if all features are supported yet, though.

imagen

imagen

(The second one is the Intel Iris Xe MAX discrete GPU).

But if I use the Xe driver instead of i915, it is only able to show the iGPU metrics. When I try to see dGPU metrics, it gives me this error:

Failed to detect engines! (No such file or directory)
(Kernel 4.16 or newer is required for i915 PMU support.)

@retrixe
Copy link
Contributor

retrixe commented Nov 23, 2024

For the new Xe kernel driver, gputop is the new monitoring utility for getting GPU utilisation stats

Xe KMD is the new default on my Lunar Lake system so unfortunately GPU utilisation under mangohud is totally unavailable

FWIW gputop is supposed to be vendor agnostic as Xe KMD exports utilisation information through the "DRM client usage stats" specification which other kernel drivers can implement as well, see:

Apparently panthor and panfrost (for ARM Mali GPUs) also use this specification, and i915 KMD (for less recent Intel GPUs) exposes it too

@flightlessmango
Copy link
Owner Author

Should be easily added into mangohud.
Just not sure which is which regarding rcs, ccs, bcs, vcs, vecs

@retrixe
Copy link
Contributor

retrixe commented Nov 24, 2024

These docs explain the meaning of these fields: https://dri.freedesktop.org/docs/drm/gpu/i915.html#intel-gpu-basics

It seems RCS is what "Render/3D" refers to in intel_gpu_top as the segment responsible for compute and 3D rendering, so I guess it's the field that's most relevant to mangohud

Games on my system only affect rcs as well whereas the other engines are at 0% utilisation

@flightlessmango
Copy link
Owner Author

Yeah that seems about right

@flightlessmango
Copy link
Owner Author

So specifically drm-cycles-rcs for util?

@17314642
Copy link
Contributor

https://www.kernel.org/doc/Documentation/gpu/i915.rst
CTRL + F for "Render Command Streamer"

in short:
rcs - Render Command Streamer (3D Graphics)
bcs - Blitting Command Streamer (Data Copying)
vcs - Video Command Streamer (Hardware Encoding and Decoding (H.264, HEVC, AV1))
vecs - Video Enhancement Command Streamer (Video Super Resolution)
ccs - Compute Command Streamer (GPGPU like AI and OpenCL, Hashcat)

@retrixe
Copy link
Contributor

retrixe commented Nov 24, 2024

gputop reports an output like this, as far as I can tell, it doesn't have an option to report JSON, CSV or anything machine readable

Details
DRM minor 0                                                                                     
  PID      MEM      RSS     rcs        vcs       vecs        bcs        ccs     NAME            
 2572       1G       1G |  0.8% ▏ ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | gnome-shell     
 5816       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | Xwayland        
DRM minor 128                                                                                   
  PID      MEM      RSS     rcs        vcs       vecs        bcs        ccs     NAME            
 2572       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | gnome-shell     
 3595       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | ptyxis          
 3595     218M     218M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | ptyxis          
 5758       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | chrome          
 5816      14M      14M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | Xwayland        
 5851     308K     308K |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | gsd-xsettings   
 5815     120M     120M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | chrome          
 5940       2M       2M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | mutter-x11-fram 
 5815     394M     394M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | chrome          
 5815       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | chrome          
 7193      36M      36M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | Discord         
 7193     181M     181M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | Discord         
 7193       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | Discord         
 3595       6M       6M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | ptyxis          
68586       0B       0B |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | gnome-system-mo 
68586      71M      71M |  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   ||  0.0%   | gnome-system-mo 

So specifically drm-cycles-rcs for util?

If you're calling the DRM client usage stats API, looking at gputop's code, it seems to be taking the delta of drm-cycles-rcs and drm-total-cycles-rcs over 2 seconds (by default), then dividing the increase in cycles by the increase in total cycles, then dividing again by drm-engine-capacity-rcs (1 by default if absent according to the Linux docs) to get the utilisation of the RCS engine

@PerAstraAdDeum
Copy link

With the release of the 6.12 Kernel I made the switch to the xe driver and so far it's been smooth sailing. I'd love for mangohud to display GPU metrics, is there any way I can feed the gpu-top data to mangohud?

@17314642
Copy link
Contributor

17314642 commented Nov 25, 2024

@PerAstraAdDeum support for the xe driver will be available by the end of the week in master branch, so you'll have to compile it yourself or wait for a new release.

@PerAstraAdDeum
Copy link

Ohh nice! Can I just install mangohud-git from the AUR once support for the xe driver has landed in the master branch, or do I need to compile it then?

@17314642
Copy link
Contributor

@PerAstraAdDeum I think you can just install mangohud-git because yay compiles it for you.

While we're at it, can you launch this in terminal?

ls /sys/class/drm/renderD*/device/hwmon/hwmon*/energy*
ls -ld /sys/class/drm/renderD*/device/driver

I want to compare against my intel card if everything's the same.

@PerAstraAdDeum
Copy link

@PerAstraAdDeum I think you can just install mangohud-git because yay compiles it for you.

While we're at it, can you launch this in terminal?

ls /sys/class/drm/renderD*/device/hwmon/hwmon*/energy*
ls -ld /sys/class/drm/renderD*/device/driver

I want to compare against my intel card if everything's the same.

I'm on MTL iGPU! (Intel(R) Core(TM) Ultra 7 155H with Intel Arc Graphics). Here's the output of the commands you gave me:

ls /sys/class/drm/renderD*/device/hwmon/hwmon*/energy*

-->

ls: cannot access '/sys/class/drm/renderD*/device/hwmon/hwmon*/energy*': No such file or directory

ls -ld /sys/class/drm/renderD*/device/driver

-->

lrwxrwxrwx 1 root root 0 Nov 25 13:20 /sys/class/drm/renderD128/device/driver -> ../../../bus/pci/drivers/xe

Hope that helps!

@17314642
Copy link
Contributor

17314642 commented Nov 25, 2024

Thank you, and I forgot last one:

ls /sys/class/drm/renderD128/device/hwmon/*

I'm checking for whether I need to account for different energy files to get power usage. In your case you don't have file called energy2_input, so power usage might not work.

@PerAstraAdDeum
Copy link

ls /sys/class/drm/renderD128/device/hwmon/*

-->

ls: cannot access '/sys/class/drm/renderD128/device/hwmon/*': No such file or directory

Something else I can do?

@17314642
Copy link
Contributor

17314642 commented Nov 25, 2024

ahahah, yeah..., do you have hwmon at all?

ls /sys/class/drm/renderD128/device/

@PerAstraAdDeum
Copy link

Nope.

And if ls /sys/class/drm/*/device/hwmon/* works as I suspect, none of the other drm entries has hwmon either.

... what does that mean? No metrics for me? 😭

@17314642
Copy link
Contributor

It means no power usage for you, probably because it's a laptop

@PerAstraAdDeum
Copy link

Oh well, I can live with that. 😆

Anything else to check?

@17314642
Copy link
Contributor

No, that's all. Thank you once again.

@PerAstraAdDeum
Copy link

Sure, glad I could help! I'll be waiting for the xe driver support to land in master and then report back.

@nokia8801
Copy link
Contributor

nokia8801 commented Dec 3, 2024

I just compiled mangohud-git after the latest commits ae4c411 and 6c49103 but unfortunately, it is still broken with Arc A750 using Xe driver. Does not show usage percentage or temperature and it shows over -100000 W power usage and it doesn't stop increasing, always climbing up. No VRAM either, though that could be related to Wine.

image

Most efficient card on the planet, supplies infinite power to the world 😄

Edit: I don't think Wine is the issue since Minecraft has the same problem.

image

$ ls /sys/class/drm/renderD*/device/hwmon/hwmon*/energy*
/sys/class/drm/renderD129/device/hwmon/hwmon2/energy2_input  /sys/class/drm/renderD129/device/hwmon/hwmon2/energy2_label

$ ls -ld /sys/class/drm/renderD*/device/driver
lrwxrwxrwx 1 root root 0 Dec  3 08:15 /sys/class/drm/renderD128/device/driver -> ../../../bus/pci/drivers/i915
lrwxrwxrwx 1 root root 0 Dec  3 08:15 /sys/class/drm/renderD129/device/driver -> ../../../../../../bus/pci/drivers/xe

@17314642
Copy link
Contributor

17314642 commented Dec 3, 2024

@nokia8801 can I contact you on discord for debugging purposes?

@PerAstraAdDeum
Copy link

@nokia8801 can I contact you on discord for debugging purposes?

I'm not Nokia and I'm not on discord, but if you need me to test something (Arc iGPU with xe driver, Arch Linux) for the next release, let me know.

@17314642
Copy link
Contributor

17314642 commented Dec 3, 2024

@PerAstraAdDeum do you have same issues as nokia? if yes, I can use your help too.

@nokia8801
Copy link
Contributor

Of course I can help mate, but I'm also not on Discord. I do have Matrix, I use the Fractal client.

@PerAstraAdDeum
Copy link

+1 for Matrix, I'm there too.

And I don't have the newest version yet, according to yay mangohud-git is out-of-date. Should I compile manually?

@nokia8801
Copy link
Contributor

mangohud-git is out of date because some newbie user flagged it as such. It is a git package so it always builds the latest version. Just go ahead and compile/install it.

@17314642
Copy link
Contributor

17314642 commented Dec 3, 2024

okay, I'll try to use matrix, don't think I used it before. Can either of you create a matrix group so we could coordinate there?

@PerAstraAdDeum you would need to compile it manually, yeah. I don't use arch linux and I thought that yay downloads latest git versions. Apparently it's also manually updated. Last update for git version was on 19th of june, which is quite old for a git version.

@PerAstraAdDeum
Copy link

Sure thing, but sadly it fails:
image

Full meson-log.txt:

Build started at 2024-12-03T13:14:57.390071
Main binary: /usr/bin/python
Build Options: -Db_pie=true -Dpython.bytecompile=1 -Dmangoapp=true -Dmangohudctl=true -Dmangoapp_layer=true -Dprefix=/usr -Dlibexecdir=lib -Dsbindir=bin -Dauto_features=enabled -Dbuildtype=plain -Dwrap_mode=default
Python system: Linux
The Meson build system
Version: 1.6.0
Source dir: /home/ian/.cache/yay/mangohud-git/src/MangoHud
Build dir: /home/ian/.cache/yay/mangohud-git/src/build
Build type: native build

MangoHud/meson.build:1:0: ERROR: Unknown options: "mangoapp_layer"

@PerAstraAdDeum
Copy link

okay, I'll try to use matrix, don't think I used it before. Can either of you create a matrix group so we could coordinate there?

@PerAstraAdDeum you would need to compile it manually, yeah. I don't use arch linux and I thought that yay downloads latest git versions. Apparently it's also manually updated. Last update for git version was on 19th of june, which is quite old for a git version.

Okay, gonna try manually now.

Hey, regarding Matrix: why not open a Room for Mangohud in general?

@17314642
Copy link
Contributor

17314642 commented Dec 3, 2024

I don't follow, you mean like create an alternative to discord server but on matrix?

@nokia8801
Copy link
Contributor

Here is the room link. You should download a Matrix client and also create a matrix account.

MangoHud | Intel Xe Driver

@Lassebq
Copy link

Lassebq commented Dec 8, 2024

Compiled latest git on my igpu-only laptop and I can see usage precent but not frequency
cat /sys/class/drm/card1/gt/gt0/rps_cur_freq_mhz
I can query it from sysfs like this. GPU is Intel UHD 620

@17314642
Copy link
Contributor

17314642 commented Dec 8, 2024

@Lassebq woah, i didn't know you could get frequency via sysfs. What's your cpu?

Anyways, currently mangohud doesn't know how to do that, but I can try and add that if you could help me test it.

@Lassebq
Copy link

Lassebq commented Dec 8, 2024

Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz

@17314642
Copy link
Contributor

17314642 commented Dec 8, 2024

Woah again, I didn't even know i915 had that functionality. I'll add frequency support in a few days. Note that this will only work for i915 driver and not xe, because xe doesn't have gt directory (at least on my pc with Arc A770).

@17314642
Copy link
Contributor

17314642 commented Dec 8, 2024

@Lassebq added in pr #1499

@Lassebq
Copy link

Lassebq commented Dec 8, 2024

Clock speed works with that PR, however it seems a bit different from the value reported in intel_gpu_top?
The value from sysfs only goes as low as 300Mhz while intel_gpu_top shows values below 100Mhz. I'm not sure why that is. It also hovers around 1000-1100Mhz at full load, while sysfs value is a constant 1150Mhz

@17314642
Copy link
Contributor

17314642 commented Dec 8, 2024

@Lassebq intel_gpu_top gathers statistics via perf events, not sysfs, maybe there are some different values. Also polling period might play a role. Mangohud checks every 500ms, while intel_gpu_top does it every 2 seconds, maybe that can affect the output.

I'll compare intel_gpu_top and mangohud and check if something can be done about this.

If you could, can you join matrix room for intel mangohud discussion? It will be easier to chat there instead of cluttering github.

Discord is also fine.

Matrix: MangoHud | Intel Xe Driver
Discord: official mangohud discord server

@17314642
Copy link
Contributor

17314642 commented Dec 9, 2024

@Lassebq

I indeed see that intel_gpu_top and mangohud are showing different values. This is because of different methods of gathering gpu frequency.

  1. intel_gpu_top does this by using perf events which require root permissions or by setting "sysctl kernel.perf_event_paranoid=0"
  2. mangohud does this by reading "/sys/bus/pci/devices/0000:03:00.0/drm/card1/gt_act_freq_mhz" which only shows frequency in increments of 50mhz

I truly don't know which way of gathering is the correct one, but considering that perf_events by default require root, second method seems more natural and I'll keep it as is.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.