Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow Updates for PAPI GitHub CI #273

Merged

Conversation

Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Oct 30, 2024

Pull Request Description

This PR addresses updating the GitHub CI for PAPI. Below will be the outlined changes.

Individual Component Tests

If a component has changes to it (including subdirectories such as tests), CI tests will only run for that specific component with four options for when we build PAPI:

  • --with-debug=yes and --with-debug=no
  • --with-shlib-tools and without --with-shlib-tools

Therefore only four tests will ever be run if changes are made to src/components/*/**. Each component will have a .yml file (e.g. cuda_component.yml) and use the ci_per_component.sh script.

Counter Analysis Toolkit

The Counter Analysis Toolkit did not previously have CI tests for it. With this PR, CI tests are added for it. These additions can be seen with cat.yml and ci_cat.sh.

As above, CI tests for the Counter Analysis Toolkit will only be ran if changes are done to src/counter_analysis_toolkit/**. As well as building PAPI will be done with the four options discussed in the Individual Component Tests section.

PAPI Framework

If changes are made to the overall PAPI framework (src/** excluding individual components as well as the Counter Analysis Toolkit), then we will run a full test suite. This will encompass 14 tests and is currently the CI being ran in this PR.

Adding a run_tests_shlib.sh Script

This script is being added to only run a single component test when the --with-shlib-tools option is enabled. Such that we can lower the overall GitHub CI time.

Additional Comments

I have individually checked the following:

  • Individual component CI tests
  • Counter analysis toolkit CI tests

to make sure that they only run when an update to their directory or subdirectories has been made.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess added the status-work-in-progress PR is still being worked on label Oct 30, 2024
@Treece-Burgess Treece-Burgess added type-bug Issues discussing bugs or PRs fixing bugs and removed type-bug Issues discussing bugs or PRs fixing bugs labels Nov 7, 2024
@Treece-Burgess Treece-Burgess added component-cuda PRs and Issues related to the cuda component component-rocm PRs and Issues related to the rocm component status-work-in-progress PR is still being worked on and removed component-cuda PRs and Issues related to the cuda component status-work-in-progress PR is still being worked on component-rocm PRs and Issues related to the rocm component labels Nov 8, 2024
@Treece-Burgess Treece-Burgess self-assigned this Nov 26, 2024
@Treece-Burgess Treece-Burgess added type-feature Issues that request a new feature or PRs that add a new feature and removed status-work-in-progress PR is still being worked on labels Nov 26, 2024
@Treece-Burgess Treece-Burgess mentioned this pull request Nov 26, 2024
3 tasks
@Treece-Burgess Treece-Burgess changed the title Workflow Updates for GitHub CI Workflow Updates for PAPI GitHub CI Nov 26, 2024
@G-Ragghianti
Copy link
Contributor

Since you have the path-based conditional run configuration, we can't see if all the checks are able to run correctly. Can you temporarily comment out the conditional path instructions in the yaml files so that all checks will run? Once it is verified, you can enable them for the final commit.

@G-Ragghianti
Copy link
Contributor

The failing Infiniband checks are running on the correct host (guyot), and the host has one active IB port, but the component is still disabled. If it you are able to get the component to activate when you do a similar check in a login session, then it is probably the case that we need to provide device access to the docker container that is running the github actions. Can you verify? Do you know which devices are required?

@Treece-Burgess
Copy link
Contributor Author

@G-Ragghianti For the path based configuration for the individual components and CAT, I have already checked them once with them running successfully. However, I have made a few changes so I will run them again to be safe.

For the infiniband component, when I am in a login session on Guyot and test building PAPI with the infiniband component it does show as active:

./configure --prefix=$PWD/test-ci --with-components="infiniband"
make && make install
utils/papi_component_avail
Active components:
Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
Name:   infiniband              Linux Infiniband statistics using the sysfs interface
Name:   sysdetect               System info detection component

Active components:
Name:   perf_event              Linux perf_event CPU counters
                                Native: 141, Preset: 17, Counters: 5
                                PMUs supported: perf, perf_raw, amd64_fam17h_zen2

Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
                                Native: 1, Preset: 0, Counters: 3
                                PMUs supported: amd64_rapl

Name:   infiniband              Linux Infiniband statistics using the sysfs interface
                                Native: 46, Preset: 0, Counters: 46

Name:   sysdetect               System info detection component
                                Native: 0, Preset: 0, Counters: 0

The infiniband component checks for /sys/class/infiniband/ and will auto-detect all active IB devices and associated ports.

@G-Ragghianti
Copy link
Contributor

Re: infiniband
I think the infiniband needs more that /sys/class/infiniband/ . That location is the same on guyot directly and in the github action container. Also, I was able to run "ibstat" in the container with no problem. Can you run papi_component_avail on guyot with strace and see what files it opens? If you want to just take the log of the strace, I can go through it.

@Treece-Burgess
Copy link
Contributor Author

Re: infiniband I think the infiniband needs more that /sys/class/infiniband/ . That location is the same on guyot directly and in the github action container. Also, I was able to run "ibstat" in the container with no problem. Can you run papi_component_avail on guyot with strace and see what files it opens? If you want to just take the log of the strace, I can go through it.

Here is the strace (strace_infiniband.txt) on Guyot for running papi_component_avail with the infiniband component compiled in.

@G-Ragghianti
Copy link
Contributor

Re: infiniband I think the infiniband needs more that /sys/class/infiniband/ . That location is the same on guyot directly and in the github action container. Also, I was able to run "ibstat" in the container with no problem. Can you run papi_component_avail on guyot with strace and see what files it opens? If you want to just take the log of the strace, I can go through it.

Here is the strace (strace_infiniband.txt) on Guyot for running papi_component_avail with the infiniband component compiled in.

I have exposed the infiniband device to the container, so the component should now activate. Please verify.

@Treece-Burgess
Copy link
Contributor Author

@G-Ragghianti The infiniband tests are now working properly.

@Treece-Burgess Treece-Burgess added update-ci PRs updating PAPI CI status-ready-to-merge PR is ready to be merged into master priority-normal PRs and Issues that are of normal priority and removed type-feature Issues that request a new feature or PRs that add a new feature labels Dec 18, 2024
@Treece-Burgess Treece-Burgess force-pushed the 10.30.24-github-ci branch 2 times, most recently from e95ac22 to 1ae2097 Compare January 2, 2025 17:44
@G-Ragghianti
Copy link
Contributor

The cuda component tests are a complete waste of time because they are failing due to timeout (1 hour limit).

@Treece-Burgess
Copy link
Contributor Author

The cuda component tests are a complete waste of time because they are failing due to timeout (1 hour limit).

There was an erroneous line of code in the run_tests.sh script. I have fixed it and it does not hit the time limit anymore. Thank you for pointing that out.

@Treece-Burgess Treece-Burgess merged commit f8e4a2c into icl-utk-edu:master Jan 6, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-normal PRs and Issues that are of normal priority status-ready-to-merge PR is ready to be merged into master update-ci PRs updating PAPI CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants