Kubernetes offers orchestration for your applications, fantastic out-of-the-box sensible default that makes it a easy plug-and-play but in order to achieve seamless deployments that do not disrupt active users on their platforms and without losing in-flight requests, we need to take a couple of steps.
Rolling Updates and the likes.
Rolling updates allow Deployments’ update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.
By default Kubernetes deployments using rolling update strategy when updating a deployment, the reason being to minimize application downtime. Engineers can then further configure according to their specifications.
Gaps and Where to find them?
Once we initiate a deployment update on Kubernetes and from following the output either on the Kubernetes Dashboard or K9S or even via kubectl . What we will may notice is that switch done updating the pod from an old to a new version is far from smooth as the application may lose some of the client requests.
In order to test, whether in-flight requests are lost, especially those made on an old pod during the switch to the new pod version, we can make use of load testing tools. What we will be looking at is to make sure all HTTP requests are handled properly, including keep-alive connection. For this particular article, we will be making use of fortio but Apache Bench can be used as well.
We connect to our application using multiple threads via HTTP in a concurrent manner. Our focus will be solely on response statuses and connection failures. In the example below, we will be using Fortio to invoke 500 requests per second and 50 concurrent keep-alive connections.
$ fortio load -a -c 8 -qps 500 -t 60s "http://foo.bar:8080"
The flag -a enables Fortio to save the report so that we can view it using the GUI. Once we fire this test against a rolling update deployment, we will see a few requests fail to connect:
Fortio 1.4.1 running at 500 queries per second, 4->4 procs, for 20s Starting at 500 qps with 50 thread(s) [gomax 4] for 20s : 200 calls each (total 10000) 10:11:25 W http_client.go:673> Parsed non ok code 502 (HTTP/1.1 502) [...] Code 200 : 9933 (99.3 %) Code 502 : 67 (0.7 %) Response Header Sizes : count 10000 avg 158.469 +/- 13.03 min 0 max 160 sum 1584692 Response Body/Total Sizes : count 10000 avg 169.786 +/- 12.1 min 161 max 314 sum 1697861 [...]
So the underlying happens when Kubernetes reroutes the traffic during the rolling update, from an old to a new pod. How this works is that the service VIP is resolved via Cluster DNS and ends up at the Pod instance. This is done via the kube-proxy that runs on every Kubernetes node and updates iptables that route to the IP addresses of the pods.
Kubernetes will take out an old pod and update its status to Terminating , remove it from the endpoints object and then send a SIGTERM. The SIGTERM causes the container to shut down in a graceful manner and not to accept any new request but to service the remaining request. Meanwhile, Kubernetes will also be routing traffic to the new pod after the old pod has been removed from the endpoints object. This is what causes the gap in our deployment: the pod being deactivated by the termination signal while the load balancer notices the change and can update its configuration. This happens asynchronously and may result in requests being routed to the deactivating pod.
Onward to Zero-Downtime
The first step that should be taken, is that our containers/application running on said containers can handle termination signals correctly, that is the process that will gracefully shut down the Unix SIGTERM.
The next step is to include readiness probe to check whether the application is ready to handle traffic, you do not want traffic route to a new pod who is not ready to handle the traffic. This is especially important for application with longer than >1second startup time.
Next up is to address pod terminating while the load balancer is being reconfigured, for this we will include a preStop lifecycle hook. This hook is called before the container terminates and is synchronous, thus must complete after which the final termination signal will be sent. So in the scenario below we are using the hook to wait, before the SIGTERM will terminate the application process and container. Meanwhile, Kubernetes reconfigures the service, remove the pod from the endpoints object. Our hook wait is to ensure the service is fully re-configured before the application process halts.
To implement, we define a preStop hook and a terminationGracePeriodSeconds below:
# application container lifecycle: preStop: exec: command: - /bin/bash - -c - sleep 20 # for entire pod terminationGracePeriodSeconds: 160
We added terminationGracePeriodSeconds to ensure that once the 160 seconds of being in the Terminating state has elapsed that the pod is removed immediately. The terminationGracePeriodSeconds must be bigger than the time we set in our preStop hook.
When we now observe the behavior of our pods during deployment, we will immediately notice that once a pod is in a Terminating state, it will not be shut down until the wait time is elapsed and once we re-test our approach using Fortio, we will notice that we now have zero failed requests
Fortio 1.4.1 running at 500 queries per second, 4->4 procs, for 20s Starting at 500 qps with 50 thread(s) [gomax 4] for 20s : 200 calls each (total 10000) [...] Code 200 : 10000 (100.0 %) Response Header Sizes : count 10000 avg 159.530 +/- 0.706 min 154 max 160 sum 1595305 Response Body/Total Sizes : count 10000 avg 168.852 +/- 2.52 min 161 max 171 sum 1688525 [...]
While Kubernetes provides excellent orchestration for application with out-of-the-box production-readiness in mind. In order to manage systems, it is key to have a deep understanding of how Kubernetes operates under the hood and how our applications behave during initialization and deinitialization.