-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep image up-to-date with its components #8
Comments
Hi @jgiannuzzi. Thanks for this issue, and i agree. However, what do say about splitting this into three separate issues? Issuer per component version? I would like to implement GitHub Actions Runner version fix asap, and if i would go that way, i would close this ticket... Looks like GitHub Actions Runner versions are creating an actual issue, and maybe GCP wasn't actually preempting instances. Looks like Runner process wasn't able to start at all, because 2.292.0 is an "old" version. 2.294.0 is latest. I did some manual setups in AWS, and tried to register runner 2.292.0 and failed. Error was "Runner too old". This might be the reason why ILGPU can not run cuda jobs at the moment. So, runners version fix to me looks like high priority. What do you think? |
Totally agree. Go ahead and split the issue! |
Closing this issue. Scheduled workflow check for new version of runners. Two new issues added for ubuntu version and nvidia drivers: |
Describe the improvement request
The following components should be kept up-to-date in the image:
I think that a daily scheduled workflow could be used to check for updates to each of those components, and create a PR with the new version number set in
variables.auto.pkrvars.hcl
when an update is detected. We may also want to have a workflow running on PRs that builds and tests the image, but does not save it.Here is how I think the check for updates could be done for each component:
Ubuntu base image
Use
gcloud
to query for the latest image in theubuntu-2004-lts
familygcloud compute images list --filter family=ubuntu-2004-lts --format "value(NAME)"
GitHub Actions Runner
Use the GitHub REST API to query for the latest release (the
v
prefix will need to be removed)gh api /repos/actions/runner/releases/latest | jq -r .tag_name
NVIDIA drivers
The decision to go from one major version to another should be done by a human (e.g. by updating the scheduled workflow).
The latest datacenter driver versions can be found on https://docs.nvidia.com/datacenter/tesla/index.html. We should probably parse this HTML file (e.g. with https://pypi.org/project/beautifulsoup4/ or https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString) and figure out the latest version for a given major (e.g.
470
->470.103.01
at the time of writing).We don't need to have all 3 components done before we can roll out this feature. It seems to me that the first 2 are low hanging fruits and we would immediately benefit from having these always up-to-date.
The text was updated successfully, but these errors were encountered: