-
Notifications
You must be signed in to change notification settings - Fork 23
Brainstorm meeting (Feb 16th 2021)
Kenneth Hoste edited this page Feb 16, 2021
·
2 revisions
- see https://github.com/EESSI/gentoo-overlay/issues/40
- Bob: may not work, eselect tool expects a Git repo?
- unless we configure it "manually"
- one-time cost, automated via Ansible playbook
- cfr. science-overlay
- unless we configure it "manually"
- security notifications sent by Gentoo team are taken into account
- recently: Python (glibc didn't affect us at this time)
- fixed by Bob for
x86_64
on Stratum-0, but not yet for Arm64 & POWER (?)
- currently done manually...
- to automate:
- run update command in writable overlay + publish to Stratum-0
- multi-step approach:
-
- (optional) dry run
-
- automate update procedure
- via Ansible playbook?
- pass packages to update as required argument?
-
- check what has changed (
cvmfs-server diff --workdir
)
- PR to private repo that has to be approved?
- 2nd person approves actual update?
- check what has changed (
-
- run smoke tests
-
- push to Stratum-0 via publisher machine
-
- also needs documented procedure
- so people other than Bob can also tackle this
- two-pairs-of-eyes-required policy?
- rollback procedure
- email alerts when stuff is published + persistent logging
- perhaps even central ELK setup?
- temporary procedure
- only step 1) + create tarball
- remove existing compat dir + unpack tarball + ingest
- annoying, can't ingest tarball at once?
- perhaps via
-d
option to delete folder first?
- publish to test repo first to test?
- via CernVM-FS variant symlinks?
- monitoring: setup in AWS?
- see https://github.com/EESSI/compatibility-layer/issues/71
- only relevant for
nvcc
(CUDA compiler) or also for runtime libraries? - go with
/usr/lib64/eessi
(or better/opt/eessi/lib64
) rather than/usr/lib64/nvidia
like ComputeCanada does, as a more generic solution?- could also be used to allow overriding of for example
libmpich.so
- pass via
user-defined-trusted-dirs
inglibc.ebuild
- inject first into RPATH via EasyBuild wrappers
- TODO for 2021.02 update of compat layer?
- could also be used to allow overriding of for example
- are we actually allowed to redistribute a CUDA installation?
- Kenneth can check with contacts at NVIDIA
- Terraform stuff, see
master
branch at https://github.com/terjekv/compatibility-layer - docs at https://github.com/terjekv/compatibility-layer/blob/master/eessi-infrastructure.py
- build compat layer from scratch in AWS
- can be extended to also support OpenStack (for POWER) + use local resources (on HPC clusters)
- should become separate repo => https://github.com/EESSI/infrastructure
- see https://github.com/EESSI/compatibility-layer/issues/42
- set up GitHub workflow to run smoke tests (presence of files, quick commands, etc.)
- starting point for running checks in GitHub CI: https://github.com/boegel/software-layer/blob/ci_pilot_repo/.github/workflows/pilot_repo.yml
- commands to run should be in a script, so it can also be run in Ansible playbook
- trivial for x86_64
- can also be done for aarch64 and ppc64le via self-hosted runners
- runner may not be supported on POWER
- but we can write our own GitHub App for this
- Works as wanted, but required some manual steps now because upstream fixes haven't been merged yet
-
start project dashboard
-
make glibc pick up on
/opt/eessi/lib64
(see above) -
collapse
gentoo-overlay
intocompatibility-layer
-
Lmod & slotted Lua stuff
- see https://github.com/EESSI/gentoo-overlay/pull/44 + https://github.com/EESSI/compatibility-layer/pull/69
- balance between pulling in stuff from upstream Gentoo vs stability
-
More packages?
-
pkgconfig
(see https://github.com/EESSI/compatibility-layer/issues/47#issuecomment-744015556) -
Bison
,DBus
,makeinfo
-
-
Update to next EasyBuild version
- Do we also need a
EESSI/common
repo for common configuration & stuff?- Other repos could access this via submodules?
- CI for our Ansible playbooks now sometimes fails because of upstream changes in Gentoo repositories
- Can we avoid this?
- Clean up
gentoo-overlay
when collapsing it intocompatibility-layer
- only copy over stuff that we actually need (Lmod, archspec, ...)
- We should start labeling issues (cfr. filesystem-layer)