Skip to content

Brainstorm meeting (Feb 16th 2021)

Kenneth Hoste edited this page Feb 16, 2021 · 2 revisions

Collapsing gentoo-overlay into compatibility-layer repo

Security updates

  • security notifications sent by Gentoo team are taken into account
    • recently: Python (glibc didn't affect us at this time)
    • fixed by Bob for x86_64 on Stratum-0, but not yet for Arm64 & POWER (?)
  • currently done manually...
  • to automate:
    • run update command in writable overlay + publish to Stratum-0
    • multi-step approach:
        1. (optional) dry run
        1. automate update procedure
        • via Ansible playbook?
        • pass packages to update as required argument?
        1. check what has changed (cvmfs-server diff --workdir)
        • PR to private repo that has to be approved?
        • 2nd person approves actual update?
        1. run smoke tests
        1. push to Stratum-0 via publisher machine
  • also needs documented procedure
    • so people other than Bob can also tackle this
    • two-pairs-of-eyes-required policy?
    • rollback procedure
  • email alerts when stuff is published + persistent logging
    • perhaps even central ELK setup?
  • temporary procedure
    • only step 1) + create tarball
    • remove existing compat dir + unpack tarball + ingest
      • annoying, can't ingest tarball at once?
      • perhaps via -d option to delete folder first?
  • publish to test repo first to test?
    • via CernVM-FS variant symlinks?
  • monitoring: setup in AWS?

CUDA

  • see https://github.com/EESSI/compatibility-layer/issues/71
  • only relevant for nvcc (CUDA compiler) or also for runtime libraries?
  • go with /usr/lib64/eessi (or better /opt/eessi/lib64) rather than /usr/lib64/nvidia like ComputeCanada does, as a more generic solution?
    • could also be used to allow overriding of for example libmpich.so
    • pass via user-defined-trusted-dirs in glibc.ebuild
    • inject first into RPATH via EasyBuild wrappers
    • TODO for 2021.02 update of compat layer?
  • are we actually allowed to redistribute a CUDA installation?
    • Kenneth can check with contacts at NVIDIA

Scripts from Terje

Validation testing

POWER support

  • Works as wanted, but required some manual steps now because upstream fixes haven't been merged yet

Next release (2021.02)

Other

  • Do we also need a EESSI/common repo for common configuration & stuff?
    • Other repos could access this via submodules?
  • CI for our Ansible playbooks now sometimes fails because of upstream changes in Gentoo repositories
    • Can we avoid this?
  • Clean up gentoo-overlay when collapsing it into compatibility-layer
    • only copy over stuff that we actually need (Lmod, archspec, ...)
  • We should start labeling issues (cfr. filesystem-layer)