Work Around Max Count of Security Group Rules on EKS

Background
- LoadBalancer-type Service and Security Group Rules
- Security Group Limits
Problem
Solutions

AWS EKS on VPC networks need AWS Security Group Rules (SG) to receipt ingress traffic. But what if you reach the max rules count in your SG?

Background

LoadBalancer-type Service and Security Group Rules

Kubernetes users can expose a Service in two ways:

Register with the Istio ingress gateways—the golden path for most tenants
Create a dedicated LoadBalancer-type Service object, which tells the cloud provider to create a load balancer and set up health checks.

EKS recommends aws-load-balancer-controller to react to updates to LoadBalancer-type Service objects and set up NLB accordingly. For example, suppose a Service object exposes ports 80 and 443, the controller will create five Security Group (SG) Rules on EKS worker Nodes:

allow ingress source 0.0.0.0/0 to the corresponding NodePort for port 80
allow ingress source 0.0.0.0/0 to the corresponding NodePort for port 443
allow EKS zonal subnet in us-west-2a to ingress to the health-check NodePort.
allow EKS zonal subnet in us-west-2b to ingress to the health-check NodePort
allow EKS zonal subnet in us-west-2c to ingress to the health-check NodePort

Note: health-check will fail if a) the Node does not host any target Pods or b) none of the target Pods on this Node is ready, determined by the Pod’s readiness probe

The SG Rules are added to an SG attached to all worker Nodes in the given EKS.

Security Group Limits

For each AWS account, there are two quota limits on Security Groups:

Max number of inbound rules per SG
Max number of SGs per network interface

These limits can be adjusted subject to the constraint that the product of the two quotas cannot exceed 1000 (AWS doc). It means a network interface can not have more than 1000 SG rules.

Problem

Once your EKS cluster approaches the limit of SG rules, it restricts your ability to create new load balancers. It means you won’t be able to perform blue-green upgrade of the load balancer, because you need to provision two sets of load balancers simultaneously. The lack of headroom also means you can no longer onboard more applications that requires a dedicated load balancer.

Solutions

The following solutions are not mutually exclusive. They can be used together.

Second dedicated SG for each node pools

Suppose your current setup is that all worker Nodes, regardless node pool, has a shared SG attached named “worker”. The aws-load-balancer-controller adds new rules to the “worker” SG.

You can keep the shared “worker” SG to store common rules but create a new SG for each node pool, and use the new SG for NLBs ingress. You need to change the node pool launch template to attach the new SG.

If you decide to continue letting the AWS LB controller manage SG rules for us, you should tag the new SG with kubernetes.io/cluster/{{ .ClusterName }}: shared. This is necessary when there are multiple security groups attached to an ENI, so that the controller knows which SG to add new rules to. Because the existing “worker” SG has this tag already, we need to create a duplicate SG, say “worker2”, which does NOT have the SG tag for NLB. Then, we will attach to the node pool the “worker2” SG and the per-pool SG.

Optimize SG rules outside of aws LB controller

Recall the aws-load-balancer-controller implementation creates 5 inbound SG rules per envoy-ingress Service. We can optimize this by managing the SG rules ourselves and asking the controller to skip SG rules creation. We can reduce the need to 2 inbound SG rules per envoy-ingress Service.

Add the service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: false` annotation to the LoadBalancer-type Service object. Documentation about this annotation is here.

Reserve 3 static NodePorts for each Service. One for NLB to health check the EKS nodes. One for frontend port 80. One for frontend port 443. You can choose a static healthCheckNodePort if you set externalTrafficPolicy: Local (which comes with the benefits to preserve source IP address). The two regular NodePorts can be static regardless.

The two regular NodePorts should be consecutive, so one SG rule can cover both. The healthCheckNodePort does not need to be consecutive, because the source IP range in the SG rule is different (i.e. only allow NLB to healthcheck the nodes).

Consider the following example:

 apiVersion: v1
 kind: Service
 metadata:
   annotations:
     external-dns.alpha.kubernetes.io/hostname: acmecorp.com
     service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
     service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
     service.beta.kubernetes.io/aws-load-balancer-type: external
+    service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: false
   name: myapp
   namespace: myapp
 spec:
   externalTrafficPolicy: Local
+  healthCheckNodePort: 30218
   ports:
     - name: https
+      nodePort: 30212
       port: 443
       protocol: TCP
       targetPort: 8095
     - name: http
+      nodePort: 30213
       port: 80
       protocol: TCP
       targetPort: 8089
   selector:
     app: myapp
   type: LoadBalancer

The optimized SG rules would be:

~~allow ingress source 0.0.0.0/0 to the corresponding NodePort for port 80~~
~~allow ingress source 0.0.0.0/0 to the corresponding NodePort for port 443~~
allow source 0.0.0.0/0 to ingress to NodePort range from 30212 to 30213
~~allow EKS zonal subnet in us-west-2a to ingress to the health-check NodePort~~
~~allow EKS zonal subnet in us-west-2b to ingress to the health-check NodePort~~
~~allow EKS zonal subnet in us-west-2c to ingress to the health-check NodePort~~
allow EKS VPC network in region us-west-2 to ingress to the health-check NodePort

Raise max inbound rules per SG by reducing SG count per ENI

The solution picks a different point on the trade-off spectrum between #Inbound rules per SG and #SG per ENI.

SG quota is set for each and whole AWS account, so any adjustment will affect other workloads in the same account. Thus, we need to verify whether the existing AWS account has ENI with max number of SGs attached already.

Build EKS clusters in a separate AWS account

Building new clusters and shifting tenants over are expensive. Try other solutions first.

26 Sep 2023

« Layer-4 Load Balancer & Zero-downtime Autoscaling and Upgrade Scaling Istio »

Charles Xu