kubectl set image deployment/nginx container_name=$image
由于 K8s 的 CRD 不能使用 set image 更新镜像进行部署,我想更新 ray cluster 的镜像就只能 kubectl patch 了
kubectl get crd
alertmanagerconfigs.monitoring.coreos.com 2022-03-15T10:13:02Z
podmonitors.monitoring.coreos.com 2022-03-15T10:13:02Z
probes.monitoring.coreos.com 2022-03-15T10:13:02Z
prometheusrules.monitoring.coreos.com 2022-03-15T10:13:02Z
proxyconfigs.networking.istio.io 2022-04-29T10:18:07Z
rayclusters.cluster.ray.io 2022-06-11T01:28:47Z
$ kubectl describe crd rayclusters.cluster.ray.io | grep Singular -B5
Group: cluster.ray.io
Kind: RayCluster
List Kind: RayClusterList
Plural: rayclusters
Singular: raycluster
Accepted Names:
Kind: RayCluster
List Kind: RayClusterList
Plural: rayclusters
Singular: raycluster
ok 终于知道 rayclusters.cluster.ray.io 这个 CRD 提供了 raycluster
$ kubectl explain raycluster
KIND: RayCluster
VERSION: cluster.ray.io/v1
kubectl get raycluster 找到要修改的目标 yaml
由于 kubectl describe 返回的格式类似 yaml 但又不是,还是得 get -o json
同事说可以用 patch 于是我看到这样的写法 https://stackoverflow.com/a/36924484
kubectl patch deployment myapp-deployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"myapp","image":""}]}}}}'
我吐了写一个如此之长的 json 好容易出错,还是换种方法写
$ kubectl get -o json raycluster cluster-b | jq .spec.podTypes[].name
kubectl get -o json raycluster idp-raycluster-b-1563128371764129792 | jq .spec.podTypes[].podConfig.spec.containers[].image
更合理是用 jq 或者 yq 这样的工具类似 json pointer 的方法去指定要修改的 json field
(jq 其实用的不是 json pointer)
注意 pacman/pip3 装的那个 yq 用的人较少,要用 mikefarah.gitbook.io 这个 golang 的 yq
但我感觉专注于 jq 就够了
$ yq --null-input '.spec.podTypes[0].podConfig.spec.containers[0].image="content"'
- podConfig:
- image: content
[w@ww Downloads]$ jq -n ".spec.podTypes[0].podConfig.spec.containers[0].image = 123"
"spec": {
"podTypes": [
"podConfig": {
"spec": {
"containers": [
"image": 123
但问题来了,jq 也没法指定 header 和 worker 用不同镜像啊
尝试 patch 刚开始 bash 多行字符串把 json 双引号弄没了,解决后遇到报错
Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json, application/apply-patch+yaml
原来是没有加 --type=merge 参数,加上后就好了
patch_json=$(python3 -c 'print("""
"spec": {
"podTypes": [
"name": "rayHeadType",
"podConfig": {
"spec": {
"containers": [
"image": "$repo/$base"
"name": "rayWorkerType",
"podConfig": {
"spec": {
"containers": [
"image": "$repo/$base"
ssh server "kubectl -n $namespace patch raycluster idp-raycluster-b-$team_id --patch '$patch_json' --type merge"
The RayCluster "idp-raycluster-b-1563300506009985024" is invalid:
* spec.podTypes.podConfig.spec.containers.name: Required value
* spec.podTypes[0].podConfig.apiVersion: Required value: must not be empty
* spec.podTypes[0].podConfig.kind: Required value: must not be empty
* spec.podTypes[1].podConfig.apiVersion: Required value: must not be empty
* spec.podTypes[1].podConfig.kind: Required value: must not be empty
好在 K8s 报错提示信息够多足以修复
"spec": {
"podTypes": [
"name": "rayHeadType",
"podConfig": {
"kind": "Pod",
"apiVersion": "v1",
"spec": {
"containers": [
"name": "ray-node",
"image": "%s"
"name": "rayWorkerType",
"podConfig": {
"kind": "Pod",
"apiVersion": "v1",
"spec": {
"containers": [
"name": "ray-node",
"image": "%s"
patch_json=$(printf "$patch_json_template" $repo/$header_img $repo/$worker_img)
ssh server "kubectl -n $namespace patch app --patch '$patch_json' --type merge"
raycluster.cluster.ray.io/app patched
总算成功修改,但该 CRD 对应的 【pod 没重启说明配置报错】,describe raycluster 看看原来是报错了
一直卡在 AutoscalingExceptionRecovery 状态
Error Logging 18m kopf Handler 'create_or_update_cluster' failed with an exception. Will retry.
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.9/site-packages/kopf/_core/actions/execution.py", line 279, in execute_handler_once
result = await invoke_handler(
File "/home/ray/anaconda3/lib/python3.9/site-packages/kopf/_core/actions/execution.py", line 374, in invoke_handler
result = await invocation.invoke(
File "/home/ray/anaconda3/lib/python3.9/site-packages/kopf/_core/actions/invocation.py...packages/ray/ray_operator/operator.py", line 282, in _create_or_update_cluster
cluster_config = operator_utils.cr_to_config(cluster_cr_body)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/ray_operator/operator_utils.py", line 111, in cr_to_config
config["available_node_types"] = get_node_types(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/ray_operator/operator_utils.py", line 133, in get_node_types
metadata = node_type["node_config"]["metadata"]
KeyError: 'metadata'
原来是 --type=merge 把原来 json 的其他字段都干没了... 并不是 局部更新
ssh server "kubectl -n $namespace patch app --type=json --patch=\
{"op": "replace", "path": "/spec/podTypes/0/podConfig/spec/containers/0/image", "value": "$repo/$base"},
{"op": "replace", "path": "/spec/podTypes/1/podConfig/spec/containers/0/image", "value": "$repo/$base"}
]' "
显然这里 path 的语法格式是 json pointer 终于看到 patch 生效而且 CRD 对应的 pod 也重启了
CRD 不支持 set image
于是 patch json 报错 unknown format
加了个 --type=merge 参数能成功,
同事说 kubectl patch --type=json -p '[{"op":"replace","path":"/spec/image","value":"a"}]'
path 是 json pointer 语法,就能更新一个字段