How to Configure Applications for High Availability in Kubernetes

How to Configure Applications for High Availability in Kubernetes

Pods in Kubernetes are the smallest orchestration unit and are ephemeral by definition:

  • Deployment/StatefulSet/DaemonSet/ReplicaSet updates or patches
  • Nodepool downscaling (compaction) or upgrades (cordoned and drained)

Kubernetes simplifies scheduling and orchestration but there are extra hurdles to develop and operate applications with high availability. I list some action items for HA and explain the motivations behind them.


Graceful Termination

When a Pod is evicted,

  • Pod containers each receive a SIGTERM.
  • Eviction controller waits terminationGracePeriodSeconds (default 30 seconds)
  • Pod containers each receive a SIGKILL
  • Eviction controller waits for all containers exit

Handle the termination signal gracefully because your application may still be serving in-flight requests or need to clean up persistent storage before exit.

Health Checks

Health checks are important to ensure that unhealthy—irresponsive, about-to-be-killed, or initializing-and-unready—Pods are not selected in the working set. Health checks can be in forms of shell commands, HTTP probes, or TCP probes.

Readiness Probes

A Pod will only start serving traffic after its Readiness probe succeeded. For example, your web app should establish a connection to your favorite database before serving its API.

Liveness Probes

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides Liveness probes to detect and remedy such situations.

Startup Probes

Startup Probes resembles Liveness probes but are intended for slow-initialization applications. Without Startup probes, one may use longer Liveness probes for slow-start applications, but doing so compromises the fast response to deadlocks that motivate Liveness probes in the first place.

Priority Class

If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. Run your mission-critical Pods on high priority. For example, logging, monitoring, backup, and a subset of application services (depending on business logic) should be of high priority, but batched and cron jobs may be not.

Resource Requirements and Quotas

Pod resources are CPU counts, memory space, and ephemeral storage size. Resource requirements include requests (minimum needed to run) and limits (maximum allowed). If you configure the same value for requests and limits, your Pods are of Guaranteed (highest) Quality of Service (QoS) Classes. If the node is under memory pressure, burstable or best effort Pods are killed and rescheduled first.

However, depending on your node size, choose sensible resource requirements. If your node has 64 vCPU, asking for 60 is just not cloud-native.

Replicated Pods

Load Balancing - Ingress and Service

Put your replicated Pods behind a load balancer to ensure your HTTP/TCP/UDP application remains accessible when Pod health or membership changes. Such a load balancer could be an Ingress or a Service, depending on whether you want this service to be accessible only in the cluster network or your company VPC or the public internet.

Leader Election

Perhaps not all of the replicated Pods should be considered active. For example, a Kubernetes controller on CRDs should logically be just one instance. Leader election allows such an application to still be replicated while preserving the single-instance abstraction.

Pod Disruption Budget (PDB)

A PDB limits the number Pods of a replicated application that are down simultaneously from voluntary disruptions. For example, a quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
  name: zk-pdb
  minAvailable: 2
      app: zookeeper

Anti-Affinity Preference

Anti-Affinity Preference allow replicated Pods to be scheduled to different nodes. Without such preference, a single node failure could knock out multiple Pods if they happen to be scheduled on the same node, which is possible since scheduling is done by resource allocation.


Autoscaling allows the service backends to adjust to the request rate and to mitigate overloading.

Vertical Autoscaling

Vertical autoscaling modifies the resources allocated to your Pods dynamically.

kind: VerticalPodAutoscaler
  name: my-vpa
    apiVersion: "extensions/v1beta1"
    kind:       Deployment
    name:       my-deployment
    updateMode: Initial

If you need to limit the number of concurrent pod restarts, use a Pod Disruption Budget. Be extra careful if you are using ingress solution from your cloud vendor, such as GCLB, it may take minutes to update backend routes which can cause routing failure and downtime when combined with frequent rescheduling.

Horizontal Autoscaling

Horizontal autoscaling dynamically modifies the replica count of a Deployment. By default, the HPA only supports scaling based on CPU or memory usage, which are often suboptimal for request-based or queue-based workloads. HPA supports autoscaling based on custom metrics, but doing so requires elevated permissions to register a Custom Metrics Adapter to serve a cluster-level API endpoint, which may not be an option if you are running in a multi-tenant environment.

Monitoring and Alerts

API Uptime and Latency

This is probably the key KPI that your customers care about. Visibility on the API uptime and latency allows quick responses to incidents. Managed solutions include Runscope, Pingdom, DataDog,, StackDriver, etc. It is also important to alert on these metrics to PagerDuty, Slack, or Email.

Resource Usage - CPU, RAM, Disk, I/O

Eccentric resource usage is usually a precursor for outages. Usually, developers or the language runtime will embed in the application Prometheus endpoints for scraping these metrics. DataDog agents make collecting them easy.

Unavailable Pods

You could have the right configurations for HA but still have unschedulable Pods—image pull unsuccessful, or container stuck in crash loop, or node auto scaler has reached max, etc.