当node宕机时,希望该node节点上的pod能够快速疏散到其他节点,并提供服务。测试发现,要等待5分钟,上面的pod才会疏散。
网上介绍通过修改
# /etc/systemd/system/kube-controller-manager.service
--node-monitor-grace-period=10s \
--node-monitor-period=3s \
--node-startup-grace-period=20s \
--pod-eviction-timeout=10s \
kubernetes节点失效后pod的调度过程:
0、Master每隔一段时间和node联系一次,判定node是否失联,这个时间周期配置项为 node-monitor-period ,默认5s
1、当node失联后一段时间后,kubernetes判定node为notready状态,这段时长的配置项为 node-monitor-grace-period ,默认40s
2、当node失联后一段时间后,kubernetes判定node为unhealthy,这段时长的配置项为 node-startup-grace-period ,默认1m0s
3、当node失联后一段时间后,kubernetes开始删除原node上的pod,这段时长配置项为 pod-eviction-timeout ,默认5m0s
在应用中,想要缩短pod的重启时间,可以修改上述几个参数
解释 官方有:
–node-monitor-grace-period duration Default: 40s |
---|
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet’s nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. |
–node-monitor-period duration Default: 5s |
The period for syncing NodeStatus in NodeController. |
–node-startup-grace-period duration Default: 1m0s |
Amount of time which we allow starting Node to be unresponsive before marking it unhealthy. |
–pod-eviction-timeout duration Default: 5m0s |
The grace period for deleting pods on failed nodes. |
实际pod-eviction-timeout参数没任何用。默认创建deployment时会创建 not-ready、unreachable。这2个参数会直接影响在pod不健康状态时恢复的时间。
参考: ignored pod-eviction-timeout settings · Issue #74651 · kubernetes/kubernetes · GitHub https://www.cnblogs.com/cptao/p/10911959.html.
所以想要更快的恢复pod。有2种办法。
1、直接修改kube-apiserver启动参数。把默认值改了。(须重启pod适配新参数后生效)
# /etc/kubernetes/manifests/kube-apiserver.yaml
.....
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --default-not-ready-toleration-seconds=60
- --default-unreachable-toleration-seconds=60
- 添加最后两行
2、创建deployment时修改默认值。tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 2
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 2
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
发表评论
共 0 条评论
暂无评论