One of the great things about Kubernetes is that it helps improve the availability and resilience of your applications. That doesn’t mean disruptions can’t happen. There are a number of reasons that can cause disruptions. An example is draining a node for an Kubernetes upgrade of the node. You should configure that your application has multiple instances to prevent downtime. You can work with with PodDisruptionBudget (PDB) to guarantee the availability during disruptions. This is a great feature to have, because it can prevent downtime for example when you’re upgrading an Kubernetes node. But if you do not configure your Pod Disruption Budget correctly, it creates unexpected situations
At our team, the Professional Development Center of Info Support, we use Azure Kubernetes Services (AKS) to host our Kubernetes cluster. AKS manages the control plane, which means we don’t have to worry about that part of the cluster. A benefit of AKS is that it helps with upgrading Kubernetes nodes. AKS manages the version upgrade of the nodes. During a version upgrade of a node the pods are automatically drained. We encountered one day that we had more nodes than we originally had configured in our cluster. During a sprint review we wanted to show our AKS setup and that our node pool consisted of 4 nodes. But to our surprise there were no less than 11 nodes in our node pool. This happened during a failed upgrade of the cluster. Due to misconfiguration, a node could not be drained. When draining a node, another node is created by AKS to run the pods of the drained node. Because the drain did not go successfully the cluster became in an unusual state. Sure, some part of this problem was how AKS manages the upgrade of nodes, but the underlying problem was how we configured the Pod Disruption Budget.
I will demonstrate how a configuration can cause problems during a node upgrade.
Example 1:
apiVersion: apps/v1 kind: Deployment metadata: name: mydeployment labels: app: myapp spec: replicas: 1 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myimage:1.0 ports: - containerPort: 80 apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: my-pdb spec: minAvailable: 1 selector: matchLabels: app: myapp
Example 2:
apiVersion: apps/v1 kind: Deployment metadata: name: mydeployment labels: app: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myimage:1.0 ports: - containerPort: 80 apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: my-pdb spec: minAvailable: 80% selector: matchLabels: app: myapp
Horizontal Pod Autoscaler
In our case we were using a Horizontal Pod Autoscaler to scale our pods. This manages the scaling of Deployments (and other resource types). You can configure a minimum and a maximum of pods it should scale to given a certain metric. This is a very nice way of reacting to a varying amount of load to our pods. But we did not configure de Pod Disruption Budget correctly. A Pod Disruption Budget is set to look at the current pods and not at the possible pods that is required of the Horizontal Pod Autoscaler. The following examples illustrates the problem.
Example 3:
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: my-autoscale spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: mydeployment minReplicas: 1 maxReplicas: 5
We configure the Horizontal Pod Autoscaler to a minimum of 1 and a maximum of 5. There is small amount of load, so the Deployment is scaled to 2 replica’s. The Pod Disruption Budget is configured to a minAvailable of 80%. Now we calculate this minimum: 80% of 2 rounded up is 2. The Pod Disruption Budget now prevents the draining of the node.
Example 4:
This example shows that Kubernetes upgrades will, ironically, only succeed when we perform them during peak hours. We use the same configuration as example 3, but now there load is very high and the deployment is therefore scaled to the maximum of 5 replica’s. Now we calculate the minAvaible again: 80% of 5 rounded up is 4. This means that we could bring 1 pod down. The drain can now be performed.
Resource shortage
The pods of the drained node will be started on a different node. In these examples I made the assumption that there is a node that can run the pods that were stopped for the node drainage. If a pod is stopped and non-draining node does not have enough resource to run the pod it could also mean that the draining of the node will stop when a Pod Disruption Budget is set.
Example 5:
A Deployment is set to 3 replica’s. We configure the Pod Disruption Budget to 2 pods. This means we have 1 pod we can bring down. The pod is brought down. Due to the Pod Disruption Budget we cannot bring done any more pods until there is a total of 3 running pods again. There is no node with enough resources to bring the pod up again. So the draining node will not drain successfully.
Blocking disruptions on purpose
It could be a valid case to make it impossbile to create disruptions. An example of this is an application that can only permit downtime in controlled circumstances. The development team could set a maxUnavaible of 0. Operations of the Kubernetes cluster cannot upgrade the node where this pod is running and has to contact the development team. Together they can than prepare for downtime.
Check your setup
Sometimes you don’t even know you are using Pod Disruption Budgets. It could be packaged in a Helm (sub)chart or installed via an Operator. You can check if you have installed Pod Disruption Budgets with:
kubectl get pdb
This gives you an overview of all the Pod Disruption Budgets in the default namespace. Of course you should modify the namespace to your situation. The result of this is:
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE gitlab-gitaly N/A 1 1 73d gitlab-gitlab-shell N/A 1 1 73d gitlab-minio-v1 N/A 1 1 73d gitlab-registry-v1 N/A 1 1 73d gitlab-sidekiq-all-in-1-v1 N/A 1 1 73d gitlab-webservice-default N/A 1 1 73d
The ALLOWED DISRUPTIONS column is very usefull because it describes if you can introduces disruptions. If this value is 0 you cannot drain the node succesfully where this pod is running. A command that you can use to detect this setup is (modify it with your namespace option):
kubectl get pdb -o=jsonpath='{.items[?(@.status.disruptionsAllowed==1)].metadata.name}'
Conclusion
The big take away about the configuration of Pod Disruption Budget is to always be aware of what you actual configure. Always check if you install a Pod Disruption Budget with a Helm Chart or Operators. When you do configure a Pod Disruption Budget ask yourself not only in which capacity unscheduled disruptions can happen, but also how that effects your scheduled disruptions like node upgrades. Does this give issues during maintenance? If so, at least be aware of it and remember the impact it can have.