The ceph-upgrade-tool.py
is a python script developed as a second tier Ceph upgrade watching tool, in order to better integrate the Ceph upgrade process with the upgrade workflow tooling.
The ceph-upgrade-tool.py
is called by the storage node upgrade Argo Workflow. This is not something that is run manually.
However, for troubleshooting purposes and if there is a specific instance where the ceph-upgrade-tool.py
is run manually, this documentation provides an overview of the tool.
This script can be run from master nodes.
(ncn-m#
) Run the following command to see the tool usage.
/usr/share/doc/csm/upgrade/scripts/ceph/ceph-upgrade-tool.py --help
Example output:
usage: ceph-upgrade-tool.py [-h] --version VERSION [--print_basic]
Ceph upgrade script
optional arguments:
-h, --help show this help message and exit
--version VERSION The target version to upgrade Ceph to. Format example
v15.2.15
--print_basic Basic status will be printed in text. A pretty-table will
not be printed.
To upgrade to Ceph version: x.y.z
, the following command could be used.
/usr/share/doc/csm/upgrade/scripts/ceph/ceph-upgrade-tool.py --version "x.y.z"
The ceph-upgrade-tool.py
tool starts a Ceph upgrade to the version provided. It does this in the following way.
- It verifies that the Ceph version provided is valid.
- It verifies that the Ceph container image can be pulled from Nexus. It specifically tries to pull the container image from
registry.local/artifactory.algol60.net/csm-docker/stable/quay.io/ceph/ceph:v<input_version>
. - If the container exists in Nexus, then the script will start a Ceph upgrade by running
ceph orch upgrade start --image <container_image>
. - It then monitors the upgrade by running
ceph orch upgrade status
and printing a pretty-table of the results.
-
To manually check the status of a Ceph upgrade, run
ceph orch upgrade status
. -
To stop a Ceph upgrade, run
ceph orch upgrade stop
. -
If an upgrade appears stuck, make sure all of the
mgr
daemons have been upgraded. The 3mgr
daemons should be the first to upgrade. If only one or two have upgraded and the third is not being upgraded for some reason, try running the following steps.-
Stop the current upgrade.
ceph orch upgrade stop
-
Manually try and force the
mgr
daemon onto the new container image. Set the container image that themgr
should be upgraded to and set the name of themgr
daemon that needs to be upgraded.container_image="registry.local/artifactory.algol60.net/csm-docker/stable/quay.io/ceph/ceph:v<version>" mgr_daemon="mgr.ncn-s00X.xxxxx"
ceph orch daemon redeploy $mgr_daemon $container_image
If the above command fails, try running
ceph mgr fail
and then rerunning the command above. -
Once all three
mgr
s are running the upgraded container image, restart the Ceph upgrade. You can restart the upgrade by runningceph-upgrade-tool.py
or by manually restarting it withceph orch upgrade start --image $container_image
.
-