KubeKey 升级 Kubernetes 次要版本实战指南

2023-12-12 17:33 由 kubesphere 发表于 #其他

作者：运维有术

前言

知识点

定级：入门级
KubeKey 如何升级 Kubernetes 次要版本
Kubernetes 升级准备及验证
KubeKey 升级 Kubernetes 的常见问题

实战服务器配置 (架构 1:1 复刻小规模生产环境，配置略有不同)

主机名	IP	CPU	内存	系统盘	数据盘	用途
k8s-master-1	192.168.9.91	4	16	40	100	KubeSphere/k8s-master
k8s-master-2	192.168.9.92	4	16	40	100	KubeSphere/k8s-master
k8s-master-3	192.168.9.93	4	16	40	100	KubeSphere/k8s-master
k8s-worker-1	192.168.9.95	8	16	40	100	k8s-worker/CI
k8s-worker-2	192.168.9.96	8	16	40	100	k8s-worker
k8s-worker-3	192.168.9.97	8	16	40	100	k8s-worker
k8s-storage-1	192.168.9.81	4	16	40	100/100/100/100/100	ElasticSearch/GlusterFS/Ceph-Rook/Longhorn/NFS/
k8s-storage-2	192.168.9.82	4	16	40	100/100/100/100	ElasticSearch/GlusterFS/Ceph-Rook/Longhorn/
k8s-storage-3	192.168.9.83	4	16	40	100/100/100/100	ElasticSearch/GlusterFS/Ceph-Rook/Longhorn/
registry	192.168.9.80	4	8	40	100	Sonatype Nexus 3
合计	10	52	152	400	2000

实战环境涉及软件版本信息

操作系统：CentOS 7.9 x86_64
KubeSphere：v3.4.1
Kubernetes：v1.24.14 to v1.26.5
Containerd：1.6.4
KubeKey: v3.0.13

1. 简介

上一期我们完成了 KubeSphere 和 Kubernetes 补丁版本升级实战 , 本期我们实战如何利用 KubeKey 实现 Kubernetes 次要版本升级。

本期内容没有涉及 KubeSphere 的次要版本升级，有需要的读者可以参考对应版本的官方升级文档。

关于跨次要版本升级，个人的考虑如下：

无论是 KubeSphere 还是 Kubernetes 非必要不升级（风险高，不可控的因素太多，就算是提前做了充分的验证测试，谁敢保证生产升级不出意外？）
Kubernetes 非必要不原地升级，建议采用“建设新版本集群 + 迁移业务应用”或是蓝绿升级的方案
一定要原地升级尽量控制在 2 个次要版本之内，且要做好充分的调研验证（比如版本不同、API 不同造成的资源兼容性，升级失败的爆炸半径等）
KubeSphere 的跨版本升级更复杂，启用的额外插件越多，涉及的组件和中间价越多，升级需要考虑验证的点也就越多

KubeKey 支持 All-in-One 集群和多节点集群两种升级场景，本文只实战演示多节点集群的升级场景， All-in-One 集群请参考官方升级指南。

KubeSphere 和 Kubernetes 次要版本升级流程与补丁版本升级流程一致，这里就不过多描述了，详情请看上文。

2. 升级实战前提条件

上一期我们完成了 Kubernetes v1.24.12 到 v1.24.14 升级，本期实战我们将基于同一套环境将 Kubernetes v1.24.14 升级到 v1.26.5。

同时，为了模拟真实的业务场景，我们继续创建一些测试资源。在验证之前，我们还需要记录当前集群的一些关键信息。

2.1 集群环境

KubeSphere v3.4.1，并启用大部分插件
安装 v1.24.14 的 Kubernetes 集群
对接 NFS 存储或是其他存储作为持久化存储（本文测试环境选用 NFS）

2.2 查看当前集群环境信息

下面查看的当前集群环境信息并不充分，只是查看几个具有代表性的资源，肯定有被忽略的组件和信息。

查看所有节点信息

[root@k8s-master-1 ~]# kubectl get nodes -o wide
NAME           STATUS   ROLES                  AGE     VERSION    INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master-1   Ready    control-plane          6d21h   v1.24.14   192.168.9.91   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-2   Ready    control-plane,worker   6d21h   v1.24.14   192.168.9.92   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-3   Ready    control-plane,worker   6d21h   v1.24.14   192.168.9.93   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-1   Ready    worker                 6d21h   v1.24.14   192.168.9.95   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-2   Ready    worker                 6d21h   v1.24.14   192.168.9.96   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-3   Ready    worker                 6d19h   v1.24.14   192.168.9.97   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4

查看所有的 Deployment 使用的 Image，方便升级后对比（仅作为记录，实际意义不大，结果中不包含 Kubernetes 核心组件）

# 受限于篇幅，输出结果略，请自己保存结果
kubectl get deploy -A -o wide

查看 Kubernetes 资源（受限于篇幅，不展示 pod 结果）

[root@k8s-master-1 ~]# kubectl get pods,deployment,sts,ds -o wide -n kube-system
NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS                     IMAGES                                                                    SELECTOR
deployment.apps/calico-kube-controllers       1/1     1            1           6d21h   calico-kube-controllers        registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.26.1    k8s-app=calico-kube-controllers
deployment.apps/coredns                       2/2     2            2           6d21h   coredns                        registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.6               k8s-app=kube-dns
deployment.apps/metrics-server                1/1     1            1           6d21h   metrics-server                 registry.cn-beijing.aliyuncs.com/kubesphereio/metrics-server:v0.4.2       k8s-app=metrics-server
deployment.apps/openebs-localpv-provisioner   1/1     1            1           6d21h   openebs-provisioner-hostpath   registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0   name=openebs-localpv-provisioner,openebs.io/component-name=openebs-localpv-provisioner

NAME                                   READY   AGE     CONTAINERS            IMAGES
statefulset.apps/snapshot-controller   1/1     6d21h   snapshot-controller   registry.cn-beijing.aliyuncs.com/kubesphereio/snapshot-controller:v4.0.0

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE     CONTAINERS    IMAGES                                                                     SELECTOR
daemonset.apps/calico-node    6         6         6       6            6           kubernetes.io/os=linux   6d21h   calico-node   registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.26.1                 k8s-app=calico-node
daemonset.apps/kube-proxy     6         6         6       6            6           kubernetes.io/os=linux   6d21h   kube-proxy    registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.24.14          k8s-app=kube-proxy
daemonset.apps/nodelocaldns   6         6         6       6            6           <none>                   6d21h   node-cache    registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12   k8s-app=nodelocaldns

查看当前 Master 和 Worker 节点使用的 Image

# Master
[root@k8s-master-1 ~]#  crictl images | grep v1.24.14
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver            v1.24.14            b651b48a617a5       34.3MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager   v1.24.14            d40212fa9cf04       31.5MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                v1.24.14            e57c0d007d1ef       39.7MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler            v1.24.14            19bf7b80c50e5       15.8MB

# Worker
[root@k8s-worker-1 ~]# crictl images | grep v1.24.14
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                      v1.24.14                       e57c0d007d1ef       39.7MB

查看 Kubernetes 核心组件二进制文件（记录比对是否有升级）

[root@k8s-master-1 ~]# ll /usr/local/bin/
total 352448
-rwxr-xr-x 1 root root  65770992 Dec  4 15:09 calicoctl
-rwxr-xr-x 1 root root  23847904 Nov 29 13:50 etcd
-rwxr-xr-x 1 kube root  17620576 Nov 29 13:50 etcdctl
-rwxr-xr-x 1 root root  46182400 Dec  4 15:09 helm
-rwxr-xr-x 1 root root  44748800 Dec  4 15:09 kubeadm
-rwxr-xr-x 1 root root  46080000 Dec  4 15:09 kubectl
-rwxr-xr-x 1 root root 116646168 Dec  4 15:09 kubelet
drwxr-xr-x 2 kube root        71 Nov 29 13:51 kube-scripts

2.3 创建测试验证资源

创建测试命名空间 upgrade-test

kubectl create ns upgrade-test

创建测试资源文件（使用 StatefulSet，便于快速创建对应的测试卷），使用镜像 nginx:latest 创建 1 个 6 副本的测试业务（包含 pvc），每个副本分布在 1 台 Worker 节点，vi nginx-test.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-nginx
  namespace: upgrade-test
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 6
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: nfs-volume
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: nfs-volume
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "nfs-sc"
      resources:
        requests:
          storage: 1Gi

创建测试资源

kubectl apply -f nginx-test.yaml

写入 index 主页文件

for id in $(seq 0 1 5);do kubectl exec -it test-nginx-$id -n upgrade-test  -- sh -c "echo I test-nginx-$id > /usr/share/nginx/html/index.html";done

查看测试资源

# 查看 Pod（每个节点一个副本）
[root@k8s-master-1 ~]# kubectl get pods -o wide -n upgrade-test
NAME           READY   STATUS    RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
test-nginx-0   1/1     Running   0          8m32s   10.233.80.77    k8s-master-1   <none>           <none>
test-nginx-1   1/1     Running   0          8m10s   10.233.96.33    k8s-master-3   <none>           <none>
test-nginx-2   1/1     Running   0          7m47s   10.233.85.99    k8s-master-2   <none>           <none>
test-nginx-3   1/1     Running   0          7m25s   10.233.87.104   k8s-worker-3   <none>           <none>
test-nginx-4   1/1     Running   0          7m3s    10.233.88.180   k8s-worker-1   <none>           <none>
test-nginx-5   1/1     Running   0          6m41s   10.233.74.129   k8s-worker-2   <none>           <none>

# 查看 PVC
[root@k8s-master-1 ~]# kubectl get pvc -o wide -n upgrade-test
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE     VOLUMEMODE
nfs-volume-test-nginx-0   Bound    pvc-e7875a9c-2736-4b32-aa29-b338959bb133   1Gi        RWO            nfs-sc         8m52s   Filesystem
nfs-volume-test-nginx-1   Bound    pvc-8bd760a8-64f9-40e9-947f-917b6308c146   1Gi        RWO            nfs-sc         8m30s   Filesystem
nfs-volume-test-nginx-2   Bound    pvc-afb47509-c249-4892-91ad-da0f69e33495   1Gi        RWO            nfs-sc         8m7s    Filesystem
nfs-volume-test-nginx-3   Bound    pvc-e6cf5935-2852-4ef6-af3d-a06458bceb49   1Gi        RWO            nfs-sc         7m45s   Filesystem
nfs-volume-test-nginx-4   Bound    pvc-7d176238-8106-4cc9-b49d-fceb25242aee   1Gi        RWO            nfs-sc         7m23s   Filesystem
nfs-volume-test-nginx-5   Bound    pvc-dc35e933-0b3c-4fb9-9736-78528d26ce6f   1Gi        RWO            nfs-sc         7m1s    Filesystem

# 查看 index.html
[root@k8s-master-1 ~]# for id in $(seq 0 1 5);do kubectl exec -it test-nginx-$id -n upgrade-test  -- cat /usr/share/nginx/html/index.html;done
I test-nginx-0
I test-nginx-1
I test-nginx-2
I test-nginx-3
I test-nginx-4
I test-nginx-5

2.4 升级时观测集群和业务状态

本文的观测并不一定全面充分，所以各位在实际升级验证测试时，需要根据真实环境补充。观测过程不便截图，请读者自行观察。

观察集群节点状态

 watch kubectl get nodes

观查测试命名空间的资源状态（重点观察 RESTARTS 是否变化）

watch kubectl get pods -o wide -n upgrade-test

ping 测模拟的业务 IP（随机找一个 Pod）

ping 10.233.80.77

curl 模拟的业务 IP（随机找一个 Pod，跟 Ping 测的不同）

watch curl 10.233.88.180

观测模拟的业务磁盘挂载情况（没验证写入，重点观察输出结果是否有 nfs 存储）

watch kubectl exec -it test-nginx-3 -n upgrade-test  -- df -h

3. 下载 KubeKey

升级集群前执行以下命令，下载最新版或是指定版本的 KubeKey。

export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.13 sh -

4. 生成集群部署配置文件

4.1 使用 KubeKey 生成配置文件

升级之前需要准备集群部署文件，首选，建议使用 KubeKey 部署 KubeSphere 和 Kubernetes 集群时使用的配置文件。

如果部署时使用的配置丢失，可以执行以下命令，基于现有集群创建一个 sample.yaml 配置文件（本文重点演示）。

./kk create config --from-cluster

备注 :

本文假设 kubeconfig 位于 ~/.kube/config。您可以通过 --kubeconfig 标志进行修改。

实际命令执行结果如下：

[root@k8s-master-1 kubekey]# ./kk create config --from-cluster
Notice: /root/kubekey/sample.yaml has been created. Some parameters need to be filled in by yourself, please complete it.

生成的配置文件 sample.yaml

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  ##You should complete the ssh information of the hosts
  - {name: k8s-master-1, address: 192.168.9.91, internalAddress: 192.168.9.91}
  - {name: k8s-master-2, address: 192.168.9.92, internalAddress: 192.168.9.92}
  - {name: k8s-master-3, address: 192.168.9.93, internalAddress: 192.168.9.93}
  - {name: k8s-worker-1, address: 192.168.9.95, internalAddress: 192.168.9.95}
  - {name: k8s-worker-2, address: 192.168.9.96, internalAddress: 192.168.9.96}
  - {name: k8s-worker-3, address: 192.168.9.97, internalAddress: 192.168.9.97}
  roleGroups:
    etcd:
    - SHOULD_BE_REPLACED
    master:
    worker:
    - k8s-master-1
    - k8s-master-2
    - k8s-master-3
    - k8s-worker-1
    - k8s-worker-2
    - k8s-worker-3
  controlPlaneEndpoint:
    ##Internal loadbalancer for apiservers
    #internalLoadbalancer: haproxy

    ##If the external loadbalancer was used, 'address' should be set to loadbalancer's ip.
    domain: lb.opsman.top
    address: ""
    port: 6443
  kubernetes:
    version: v1.24.14
    clusterName: opsman.top
    proxyMode: ipvs
    masqueradeAll: false
    maxPods: 110
    nodeCidrMaskSize: 24
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
  registry:
    privateRegistry: ""

4.2 修改配置文件模板

根据实际的集群配置修改 sample.yaml 文件，请确保正确修改以下字段。

hosts：您主机的基本信息（主机名和 IP 地址）以及使用 SSH 连接至主机的信息（重点修改，需要加入 SSH 用户名和密码）。
roleGroups.etcd：etcd 节点（重点修改）。
roleGroups.master：master 节点（重点修改，默认没生成，必须手动加入否则会报错，参见常见问题 1），注意： 该参数字段在部署时生成的配置文件中的名称为 roleGroups.control-plane。
roleGroups.worker：worker 节点（核对修改）。
controlPlaneEndpoint：负载均衡器信息（可选）。
kubernetes.containerManager：修改容器运行时（必选，默认没生成，必须手动加入否则会报错，参见常见问题 2）
registry：镜像服务信息（可选）。

修改后文件内容：

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  ##You should complete the ssh information of the hosts
  - {name: k8s-master-1, address: 192.168.9.91, internalAddress: 192.168.9.91, user: root, password: "P@88w0rd"}
  - {name: k8s-master-2, address: 192.168.9.92, internalAddress: 192.168.9.92, user: root, password: "P@88w0rd"}
  - {name: k8s-master-3, address: 192.168.9.93, internalAddress: 192.168.9.93, user: root, password: "P@88w0rd"}
  - {name: k8s-worker-1, address: 192.168.9.95, internalAddress: 192.168.9.95, user: root, password: "P@88w0rd"}
  - {name: k8s-worker-2, address: 192.168.9.96, internalAddress: 192.168.9.96, user: root, password: "P@88w0rd"}
  - {name: k8s-worker-3, address: 192.168.9.97, internalAddress: 192.168.9.97, user: root, password: "P@88w0rd"}
  roleGroups:
    etcd:
    - k8s-master-1
    - k8s-master-2
    - k8s-master-3
    master:
    - k8s-master-1
    - k8s-master-2
    - k8s-master-3
    worker:
    - k8s-master-1
    - k8s-master-2
    - k8s-master-3
    - k8s-worker-1
    - k8s-worker-2
    - k8s-worker-3
  controlPlaneEndpoint:
    ##Internal loadbalancer for apiservers
    internalLoadbalancer: haproxy

    ##If the external loadbalancer was used, 'address' should be set to loadbalancer's ip.
    domain: lb.opsman.top
    address: ""
    port: 6443
  kubernetes:
    version: v1.24.12
    clusterName: opsman.top
    proxyMode: ipvs
    masqueradeAll: false
    maxPods: 110
    nodeCidrMaskSize: 24
    containerManager: containerd
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
  registry:
    privateRegistry: ""

5. 升级 Kubernetes

5.1 升级 Kubernetes

执行以下命令，将 Kubernetes 从 v1.24.14 升级至 v1.26.5。

export KKZONE=cn
./kk upgrade --with-kubernetes v1.26.5 -f sample.yaml

执行后的结果如下（按提示输入 yes 继续）：

[root@k8s-master-1 kubekey]# ./kk upgrade --with-kubernetes v1.26.5 -f sample.yaml


 _   __      _          _   __
| | / /     | |        | | / /
| |/ / _   _| |__   ___| |/ /  ___ _   _
|    \| | | | '_ \ / _ \    \ / _ \ | | |
| |\  \ |_| | |_) |  __/ |\  \  __/ |_| |
\_| \_/\__,_|_.__/ \___\_| \_/\___|\__, |
                                    __/ |
                                   |___/

09:26:43 CST [GreetingsModule] Greetings
09:26:43 CST message: [k8s-worker-3]
Greetings, KubeKey!
09:26:44 CST message: [k8s-master-3]
Greetings, KubeKey!
09:26:44 CST message: [k8s-master-1]
Greetings, KubeKey!
09:26:44 CST message: [k8s-worker-1]
Greetings, KubeKey!
09:26:44 CST message: [k8s-master-2]
Greetings, KubeKey!
09:26:44 CST message: [k8s-worker-2]
Greetings, KubeKey!
09:26:44 CST success: [k8s-worker-3]
09:26:44 CST success: [k8s-master-3]
09:26:44 CST success: [k8s-master-1]
09:26:44 CST success: [k8s-worker-1]
09:26:44 CST success: [k8s-master-2]
09:26:44 CST success: [k8s-worker-2]
09:26:44 CST [NodePreCheckModule] A pre-check on nodes
09:26:45 CST success: [k8s-worker-2]
09:26:45 CST success: [k8s-worker-3]
09:26:45 CST success: [k8s-master-2]
09:26:45 CST success: [k8s-worker-1]
09:26:45 CST success: [k8s-master-3]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [ClusterPreCheckModule] Get KubeConfig file
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [ClusterPreCheckModule] Get all nodes Kubernetes version
09:26:45 CST success: [k8s-worker-2]
09:26:45 CST success: [k8s-worker-3]
09:26:45 CST success: [k8s-worker-1]
09:26:45 CST success: [k8s-master-2]
09:26:45 CST success: [k8s-master-3]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [ClusterPreCheckModule] Calculate min Kubernetes version
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST [ClusterPreCheckModule] Check desired Kubernetes version
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST [ClusterPreCheckModule] Check KubeSphere version
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [ClusterPreCheckModule] Check dependency matrix for KubeSphere and Kubernetes
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [ClusterPreCheckModule] Get kubernetes nodes status
09:26:45 CST skipped: [k8s-master-3]
09:26:45 CST skipped: [k8s-master-2]
09:26:45 CST success: [k8s-master-1]
09:26:45 CST [UpgradeConfirmModule] Display confirmation form
+--------------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+
| name         | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker | containerd | nfs client | ceph client | glusterfs client | time         |
+--------------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+
| k8s-master-1 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
| k8s-master-2 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
| k8s-master-3 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
| k8s-worker-1 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
| k8s-worker-2 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
| k8s-worker-3 | y    | y    | y       | y        | y     | y     | y       | y         | y      |        | v1.6.4     | y          |             |                  | CST 09:26:45 |
+--------------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+

Cluster nodes status:
NAME           STATUS   ROLES                  AGE     VERSION    INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master-1   Ready    control-plane          6d21h   v1.24.14   192.168.9.91   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-2   Ready    control-plane,worker   6d21h   v1.24.14   192.168.9.92   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-3   Ready    control-plane,worker   6d21h   v1.24.14   192.168.9.93   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-1   Ready    worker                 6d21h   v1.24.14   192.168.9.95   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-2   Ready    worker                 6d21h   v1.24.14   192.168.9.96   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-3   Ready    worker                 6d19h   v1.24.14   192.168.9.97   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4

Upgrade Confirmation:
kubernetes version: v1.24.14 to v1.26.5

Continue upgrading cluster? [yes/no]:

注意： Upgrade 信息确认，Kubernetes 提示从 v1.24.14 升级至 v1.26.5。

点击「yes」后，删减版最终执行结果如下：

09:47:18 CST [ProgressiveUpgradeModule 2/2] Set current k8s version
09:47:18 CST skipped: [LocalHost]
09:47:18 CST [ChownModule] Chown user $HOME/.kube dir
09:47:19 CST success: [k8s-worker-3]
09:47:19 CST success: [k8s-worker-1]
09:47:19 CST success: [k8s-worker-2]
09:47:19 CST success: [k8s-master-1]
09:47:19 CST success: [k8s-master-2]
09:47:19 CST success: [k8s-master-3]
09:47:19 CST Pipeline[UpgradeClusterPipeline] execute successfully

注意： 由于篇幅限制，我无法粘贴完整的执行结果。然而，我强烈建议各位读者保存并仔细分析升级过程日志，以便更深入地理解 Kubekey 升级 Kubernetes 的流程。这些日志文件提供了关于升级过程中所发生事件的重要信息，可以帮助您更好地理解整个升级过程以及可能遇到的问题，并在需要时进行故障排除。

5.2 升级过程观测说明

升级的过程中观测任务输出结果，有几点需要注意：

Master 和 Worker 节点会逐一升级（Master 2 分钟左右，Worker 1 分钟左右），升级过程中在 Master 节点执行 kubectl 的命令时会出现 API 无法连接的情况

[root@k8s-master-1 ~]# kubectl get nodes
The connection to the server lb.opsman.top:6443 was refused - did you specify the right host or port?

说明：

出现这种现象并不是说 Kubernetes 的 API 没有高可用，实际上是伪高可用。

主要是因为 KubeKey 部署的内置负载均衡 HAProxy 只作用于 Worker 节点，在 Master 节点只会连接本机的 kube-apiserver（因此，也说明有条件还是自建负载均衡比较好）。

# Master 节点
[root@k8s-master-1 ~]# ss -ntlup | grep 6443
tcp    LISTEN     0      32768  [::]:6443               [::]:*                   users:(("kube-apiserver",pid=11789,fd=7))


# Worker 节点
[root@k8s-worker-1 ~]# ss -ntlup | grep 6443
tcp    LISTEN     0      4000   127.0.0.1:6443                  *:*                   users:(("haproxy",pid=53535,fd=7))

测试的 Nginx 业务服务未中断（ping、curl、df 都未见异常）
Kubernetes 核心组件升级顺序：先升级到 v1.25.10，再升级到 v1.26.5（符合 Kubekey 设计和 Kubernetes 次要版本升级差异的要求）
kube-apiserver、kube-controller-manager、kube-proxy、kube-scheduler 镜像从 v1.24.14 先升级到 v1.25.10，然后再升级到 v1.26.5。

升级到 v1.25.10 时下载的二进制软件列表（标粗的说明有变化）：

kubeadm v1.25.10
kubelet v1.25.10
kubectl v1.25.10
helm v3.9.0
kubecni v1.2.0
crictl v1.24.0
etcd v3.4.13
containerd 1.6.4
runc v1.1.1
calicoctl v3.26.1

升级到 v1.26.5 时下载的二进制软件列表（标粗的说明有变化）：

kubeadm v1.26.5
kubelet v1.26.5
kubectl v1.26.5
其他的没变化

升级过程中涉及的 Image 列表（标粗的说明有变化）：

calico/cni:v3.26.1
calico/kube-controllers:v3.26.1
calico/node:v3.26.1
calico/pod2daemon-flexvol:v3.26.1
coredns/coredns:1.9.3
kubesphere/k8s-dns-node-cache:1.15.12
kubesphere/kube-apiserver:v1.25.10
kubesphere/kube-apiserver:v1.26.5
kubesphere/kube-controller-manager:v1.25.10
kubesphere/kube-controller-manager:v1.26.5
kubesphere/kube-proxy:v1.25.10
kubesphere/kube-proxy:v1.26.5
kubesphere/kube-scheduler:v1.25.10
kubesphere/kube-scheduler:v1.26.5
kubesphere/pause:3.8
library/haproxy:2.3

升级后所有节点核心 Image 列表（验证升级版本的顺序）：

# Master 命令有筛选，去掉了 v1.24.12 和重复的 docker.io 开通的镜像
[root@k8s-master-1 ~]# crictl images | grep v1.2[4-6] | grep -v "24.12"| grep -v docker.io | sort
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver            v1.24.14            b651b48a617a5       34.3MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver            v1.25.10            4aafc4b1604b9       34.4MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver            v1.26.5             25c2ecde661fc       35.5MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager   v1.24.14            d40212fa9cf04       31.5MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager   v1.25.10            e446ea5ea9b1b       31.5MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager   v1.26.5             a7403c147a516       32.4MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                v1.24.14            e57c0d007d1ef       39.7MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                v1.25.10            0cb798db55ff2       20.3MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                v1.26.5             08440588500d7       21.5MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler            v1.24.14            19bf7b80c50e5       15.8MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler            v1.25.10            de3c37c13188f       16MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler            v1.26.5             200132c1d91ab       17.7MB

# Worker
[root@k8s-worker-1 ~]# crictl images | grep v1.2[4-6] | grep -v "24.12"| grep -v docker.io | sort
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                      v1.24.14                       e57c0d007d1ef       39.7MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                      v1.25.10                       0cb798db55ff2       20.3MB
registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy                      v1.26.5                        08440588500d7       21.5MB

5.3 Kubernetes 升级后验证

查看 Nodes 版本（VERSION 更新为 v1.26.5）

[root@k8s-master-1 ~]# kubectl get nodes -o wide
NAME           STATUS   ROLES                  AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master-1   Ready    control-plane          6d22h   v1.26.5   192.168.9.91   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-2   Ready    control-plane,worker   6d22h   v1.26.5   192.168.9.92   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-master-3   Ready    control-plane,worker   6d22h   v1.26.5   192.168.9.93   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-1   Ready    worker                 6d22h   v1.26.5   192.168.9.95   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-2   Ready    worker                 6d22h   v1.26.5   192.168.9.96   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4
k8s-worker-3   Ready    worker                 6d20h   v1.26.5   192.168.9.97   <none>        CentOS Linux 7 (Core)   5.4.261-1.el7.elrepo.x86_64   containerd://1.6.4

查看 Kubernetes 资源（受限于篇幅，不展示 pod 结果，但实际变化都在 Pod 上）

[root@k8s-master-1 kubekey]# kubectl get pod,deployment,sts,ds -o wide -n kube-system
NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS                     IMAGES                                                                    SELECTOR
deployment.apps/calico-kube-controllers       1/1     1            1           6d22h   calico-kube-controllers        registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.26.1    k8s-app=calico-kube-controllers
deployment.apps/coredns                       2/2     2            2           6d22h   coredns                        registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.6               k8s-app=kube-dns
deployment.apps/metrics-server                1/1     1            1           6d22h   metrics-server                 registry.cn-beijing.aliyuncs.com/kubesphereio/metrics-server:v0.4.2       k8s-app=metrics-server
deployment.apps/openebs-localpv-provisioner   1/1     1            1           6d22h   openebs-provisioner-hostpath   registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0   name=openebs-localpv-provisioner,openebs.io/component-name=openebs-localpv-provisioner

NAME                                   READY   AGE     CONTAINERS            IMAGES
statefulset.apps/snapshot-controller   1/1     6d22h   snapshot-controller   registry.cn-beijing.aliyuncs.com/kubesphereio/snapshot-controller:v4.0.0

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE     CONTAINERS    IMAGES                                                             SELECTOR
daemonset.apps/calico-node    6         6         6       6            6           kubernetes.io/os=linux   6d22h   calico-node   registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.26.1         k8s-app=calico-node
daemonset.apps/kube-proxy     6         6         6       6            6           kubernetes.io/os=linux   6d22h   kube-proxy    registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.26.5   k8s-app=kube-proxy
daemonset.apps/nodelocaldns   6         6         6       6            6           <none>                   6d22h   node-cache    kubesphere/k8s-dns-node-cache:1.15.12                              k8s-app=nodelocaldns

查看二进制文件（对比升级前结果，验证组件是否有变更）

[root@k8s-master-1 ~]# ll /usr/local/bin/
total 360880
-rwxr-xr-x 1 root root  65770992 Dec  6 09:41 calicoctl
-rwxr-xr-x 1 root root  23847904 Nov 29 13:50 etcd
-rwxr-xr-x 1 kube root  17620576 Nov 29 13:50 etcdctl
-rwxr-xr-x 1 root root  46182400 Dec  6 09:41 helm
-rwxr-xr-x 1 root root  46788608 Dec  6 09:41 kubeadm
-rwxr-xr-x 1 root root  48046080 Dec  6 09:41 kubectl
-rwxr-xr-x 1 root root 121277432 Dec  6 09:41 kubelet
drwxr-xr-x 2 kube root        71 Nov 29 13:51 kube-scripts

注意： 除了 etcd、etcdctl 其他都有更新，说明 ETCD 不在组件更新范围内

创建测试资源

kubectl create deployment nginx-upgrade-test --image=nginx:latest --replicas=6 -n upgrade-test

说明： 本测试比较简单，生产环境建议更充分的测试。

查看创建的测试资源

# 查看 Deployment
[root@k8s-master-1 ~]# kubectl get deployment -n upgrade-test -o wide
NAME                 READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES         SELECTOR
nginx-upgrade-test   6/6     6            6           13s   nginx        nginx:latest   app=nginx-upgrade-test

# 查看 Pod
[root@k8s-master-1 ~]# kubectl get deployment,pod -n upgrade-test -o wide
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES         SELECTOR
deployment.apps/nginx-upgrade-test   6/6     6            6           56s   nginx        nginx:latest   app=nginx-upgrade-test

NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
pod/nginx-upgrade-test-7b668447c4-cglmh   1/1     Running   0          56s   10.233.80.85    k8s-master-1   <none>           <none>
pod/nginx-upgrade-test-7b668447c4-cmzqx   1/1     Running   0          56s   10.233.88.181   k8s-worker-1   <none>           <none>
pod/nginx-upgrade-test-7b668447c4-ggnjg   1/1     Running   0          56s   10.233.87.106   k8s-worker-3   <none>           <none>
pod/nginx-upgrade-test-7b668447c4-hh8h4   1/1     Running   0          56s   10.233.96.38    k8s-master-3   <none>           <none>
pod/nginx-upgrade-test-7b668447c4-hs7n7   1/1     Running   0          56s   10.233.74.130   k8s-worker-2   <none>           <none>
pod/nginx-upgrade-test-7b668447c4-s6wps   1/1     Running   0          56s   10.233.85.102   k8s-master-2   <none>           <none>
pod/test-nginx-0                          1/1     Running   0          85m   10.233.80.77    k8s-master-1   <none>           <none>
pod/test-nginx-1                          1/1     Running   0          85m   10.233.96.33    k8s-master-3   <none>           <none>
pod/test-nginx-2                          1/1     Running   0          84m   10.233.85.99    k8s-master-2   <none>           <none>
pod/test-nginx-3                          1/1     Running   0          84m   10.233.87.104   k8s-worker-3   <none>           <none>
pod/test-nginx-4                          1/1     Running   0          83m   10.233.88.180   k8s-worker-1   <none>           <none>
pod/test-nginx-5                          1/1     Running   0          83m   10.233.74.129   k8s-worker-2   <none>           <none>

重建已有 Pod（未验证，生产环境升级必须验证，防止出现跨版本的兼容性问题）
在 KubeSphere 管理控制台查看集群状态

经过一系列的操作，我们成功地利用 KubeKey 完成了 Kubernetes 次要版本的升级和测试验证。在这个过程中，我们经历了多个关键环节，包括实战环境准备、升级实施以及测试验证等。最终，我们实现了版本升级的目标，并且验证了升级后的系统（基本）能够正常运行。

6. 常见问题

问题 1

报错信息

[root@k8s-master-1 kubekey]# ./kk upgrade --with-kubesphere v3.4.1 -f sample.yaml
14:00:54 CST [FATA] The number of master/control-plane cannot be 0

解决方案

修改集群部署文件 sample.yaml，正确填写 roleGroups.master：master 节点信息

问题 2

报错信息

Continue upgrading cluster? [yes/no]: yes
14:07:02 CST success: [LocalHost]
14:07:02 CST [SetUpgradePlanModule 1/2] Set upgrade plan
14:07:02 CST success: [LocalHost]
14:07:02 CST [SetUpgradePlanModule 1/2] Generate kubeadm config
14:07:02 CST message: [k8s-master-1]
Failed to get container runtime cgroup driver.: Failed to exec command: sudo -E /bin/bash -c "docker info | grep 'Cgroup Driver'"
/bin/bash: docker: command not found: Process exited with status 1
14:07:02 CST retry: [k8s-master-1]
14:07:07 CST message: [k8s-master-1]
Failed to get container runtime cgroup driver.: Failed to exec command: sudo -E /bin/bash -c "docker info | grep 'Cgroup Driver'"
/bin/bash: docker: command not found: Process exited with status 1
14:07:07 CST retry: [k8s-master-1]
14:07:12 CST message: [k8s-master-1]
Failed to get container runtime cgroup driver.: Failed to exec command: sudo -E /bin/bash -c "docker info | grep 'Cgroup Driver'"
/bin/bash: docker: command not found: Process exited with status 1
14:07:12 CST skipped: [k8s-master-3]
14:07:12 CST skipped: [k8s-master-2]
14:07:12 CST failed: [k8s-master-1]
error: Pipeline[UpgradeClusterPipeline] execute failed: Module[SetUpgradePlanModule 1/2] exec failed:
failed: [k8s-master-1] [GenerateKubeadmConfig] exec failed after 3 retries: Failed to get container runtime cgroup driver.: Failed to exec command: sudo -E /bin/bash -c "docker info | grep 'Cgroup Driver'"
/bin/bash: docker: command not found: Process exited with status 1

解决方案

修改集群部署文件 sample.yaml，正确填写 kubernetes.containerManager: containerd，默认使用 Docker。

7. 总结

本文通过实战演示了 KubeKey 部署的 KubeSphere 和 Kubernetes 集群升级 Kubernetes 次要版本的详细过程，以及升级过程中遇到的问题和对应的解决方案。同时，也阐述了在升级前和升级后需要进行哪些验证，以确保系统升级的成功。

利用 KubeKey 升级 KubeSphere 和 Kubernetes 并完成测试验证是一个复杂而重要的任务。通过这次升级实战，我们验证了技术可行性、实现了系统的更新和优化，提高了系统的性能和安全性。同时，我们也积累了宝贵的经验教训，为未来的生产环境升级工作提供了参考和借鉴。

概括总结全文主要涉及以下内容：

升级实战环境准备
Kubernetes 升级准备和升级过程监测
利用 KubeKey 升级 Kubernetes
Kubernetes 升级后验证

本文所提供的实战内容可以直接应用于测试和研发环境。同时，对于生产环境也有一定的参考价值。然而，请务必谨慎对待，不要直接将其应用于生产环境。

本文由博客一文多发平台 OpenWrite 发布！

热门相关：美食供应商陆鸣至尊神殿惊世第一妃：魔帝，宠上身！圣人门徒无上崛起