Kubernetes,  Linux

服务器重启后kubernetes无法启动的原因

一故障现象

[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
          └─10-kubeadm.conf
  Active: activating (auto-restart) (Result: exit-code) since 五 2021-11-26 13:39:00 CST; 9s ago
    Docs: https://kubernetes.io/docs/
Process: 8824 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 8824 (code=exited, status=1/FAILURE)

11月 26 13:39:00 master-node systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
11月 26 13:39:00 master-node systemd[1]: Unit kubelet.service entered failed state.
11月 26 13:39:00 master-node systemd[1]: kubelet.service failed.
[root@master-node ~]#

二故障重现

在一台3个节点的kubernetes集群上,重启了master节点之后,发现kubelet服务启动失败,无论手动启动、还是其自动重启,都无法顺利启动,报错如上。百思不得其解。

三故障原因

服务器开启了交换分区。

事实上,当前节点机器上通过df -Th并没有看到已经开启交换分区。但是,执行一下swapoff -a,再启动kubelet服务就正常了。

[root@master-node ~]# swapoff -a
[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
          └─10-kubeadm.conf
  Active: activating (auto-restart) (Result: exit-code) since 五 2021-11-26 13:39:00 CST; 9s ago
    Docs: https://kubernetes.io/docs/
Process: 8824 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 8824 (code=exited, status=1/FAILURE)

11月 26 13:39:00 master-node systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
11月 26 13:39:00 master-node systemd[1]: Unit kubelet.service entered failed state.
11月 26 13:39:00 master-node systemd[1]: kubelet.service failed.
[root@master-node ~]# systemctl start kubelet
[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
          └─10-kubeadm.conf
  Active: active (running) since 五 2021-11-26 13:39:10 CST; 5s ago
    Docs: https://kubernetes.io/docs/
Main PID: 8854 (kubelet)
  Tasks: 15
  Memory: 44.0M
  CGroup: /system.slice/kubelet.service
          └─8854 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-co...

11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.449196    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.550136    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.650889    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.750974    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.852678    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.953314    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.054236    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.155038    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.256030    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.356340    8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
[root@master-node ~]# kubectl get nodes
NAME         STATUS   ROLES                 AGE   VERSION
master-node   Ready   control-plane,master   21d   v1.22.3
node-1       Ready   <none>                 21d   v1.22.3
node-2       Ready   <none>                 21d   v1.22.3
[root@master-node ~]#

四彻底解决

当前master节点重启之后,虽然通过df -Th没有看到SWAP分区信息,但是并不表示系统没有开启SWAP。如:

[root@master-node ~]# ll /etc/fstab 
-rw-r--r--. 1 root root 465 1月   8 2020 /etc/fstab
[root@master-node ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Wed Jan 8 17:41:56 2020
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=0c810ea4-9c87-4512-a1ed-71dfdc89498b /boot                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                   swap   defaults        0 0
[root@master-node ~]# swapon -s
文件名                         类型           大小   已用   权限
/dev/dm-1                               partition       8257532 0       -1
[root@master-node ~]# swapoff -a
[root@master-node ~]# swapon -s
[root@master-node ~]#

可以看到系统的文件系统配置文件里有配置SWAP,同时通过swapon -s可以看到当前系统的SWAP使用情况,但是df -Th没有看到SWAP信息。为了彻底解决该问题,防止下次机器重启之后,无法启动kubernetes服务,我们把/etc/fstab文件里关于SWAP的配置注释或者彻底删除掉。即可。

五参考

https://stackoverflow.com/questions/62407918/kubelet-service-is-not-starting

留言