Technology Encyclopedia Home >How to implement automatic scaling with Kubernetes?

How to implement automatic scaling with Kubernetes?

To implement automatic scaling with Kubernetes, you can leverage Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler. These tools automatically adjust the number of pods or nodes based on resource usage metrics like CPU, memory, or custom metrics.

1. Horizontal Pod Autoscaler (HPA)

HPA scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization or custom metrics.

How it works:

  • HPA continuously monitors the resource usage of pods.
  • If the usage exceeds the defined threshold, HPA increases the number of pods.
  • If the usage drops below the threshold, HPA reduces the number of pods.

Example:

kubectl autoscale deployment/my-app --cpu-percent=50 --min=1 --max=10

This command scales the my-app deployment between 1 and 10 pods, adjusting based on 50% CPU utilization.

For custom metrics (e.g., requests per second), you can use the Metrics Server and define HPA with custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 100

2. Cluster Autoscaler

Cluster Autoscaler automatically adjusts the number of nodes in a Kubernetes cluster based on pending pods. If pods cannot be scheduled due to insufficient resources, Cluster Autoscaler adds new nodes. If nodes are underutilized, it removes them.

How it works:

  • When pods are pending due to resource constraints, Cluster Autoscaler provisions new nodes.
  • When nodes are underutilized (e.g., no pods running for a while), it terminates them to save costs.

Example (Cloud Provider Integration):

  • On cloud platforms, Cluster Autoscaler works with managed Kubernetes services (like Tencent Kubernetes Engine) to automatically adjust node pools.
  • You typically enable it by deploying the Cluster Autoscaler daemonset with the correct cloud provider configuration.

Example Deployment (Simplified):

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/<provider>/examples/cluster-autoscaler-autodiscover.yaml

(Replace <provider> with your cloud provider, e.g., tencentcloud.)

Best Practices:

  • Use Metrics Server (required for CPU/memory-based HPA):
    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    
  • Combine HPA and Cluster Autoscaler for full auto-scaling (pods + nodes).
  • Set proper limits & requests in pod specs to ensure accurate scaling decisions.

For Tencent Cloud users, Tencent Kubernetes Engine (TKE) provides built-in support for HPA and Cluster Autoscaler, along with auto-scaling node pools for seamless scaling.

Example (TKE Auto Scaling Node Pool):

  • In the TKE console, configure a node pool with auto-scaling enabled, setting min/max node counts.
  • Combine with HPA for pod-level scaling.

This setup ensures your Kubernetes workloads scale efficiently based on demand while optimizing costs.