Service API Changes: Prefer Blue-green Update to Rolling Update
Summary
To achieve zero-downtime service update, Kubernetes rolling update implies the API must be both forward and backward compatible. Forward compatibility is hard if at all makes sense. Blue-green update requires only backward compatibility to ensure zero downtime. Blue-green update is not supported by the Kubernetes core API but achievable with simple scripts or CRD + controller.
The Deployment object in Kubernetes supports service rolling update in the hope of providing zero downtime service update. The rolling update is done by scaling out a new replica set of pods using containers with the new version and shrinking the old replica set. However, rolling updates are insufficient to avoid downtime when the service API changes. The API must be both forward and backward compatible due to the coexistence of both versions of servers and clients.
As the new replica set is scaling out, the Service label selector will include all the available pods spanning both replica sets, which means both versions of the server are serving requests. Imagine these are web servers and v1 (v2) servers hand out v1 (v2) JS client that runs in browsers. During the update, many browsers still run v1 client, but the service starts giving out v2 clients. The Service object load balances the ingress to both v1 and v2 servers. Hence, requests from v2 client could be routed to v1 server (forward compatible) and v1 client could be routed to v2 server (backward compatible). The semantics of forward compatibility is always hairy. One could “gracefully” handle an unsupported request by ignoring it, returning the not-found status code, or returning a response asking for a retry in the hope of landing on a server with the new version (and good luck with sticky sessions). None of these is satisfactory.
Blue-green update does not require forward compatibility because it ensures a single version of servers. A new Deployment is made and the Service label selector is updated.
There could still be older versions of clients after the update is complete (backward compatible), but the client of the latest version of will not be handled by the servers of old versions.
Blue-green update is easy even by hand. Alternatively, one could leverage the BlueGreenDeployment CRD (CustomResourceDefinition) and deploy a controller handling updates to the BlueGreenDeployment object. Check out https://github.com/google/blue-green-deployment-controller.