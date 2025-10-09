Accurately estimate your workload

When you deploy a workload on Kubernetes, you need to declare how much CPU, memory, and (optionally) ephemeral storage it needs. If you underestimate, your workload will run out of resources and either crash (memory) or slow down (CPU). If you overestimate, you pay for resources you don't use.

The Kubernetes HorizontalPodAutoscaler automatically scales your workloads up and down based on CPU and memory metrics. The cluster autoscaler scales the number of nodes to ensure all pods can be scheduled. However, you still need to ensure that your pods use their resources without unnecessary waste.

Perfect utilization is impossible since workloads don't use constant amounts of CPU and memory over time. However, by using tools like Goldilocks and KRR, you can determine the optimal range of resource use and adjust your pods' resource requests accordingly. You can use the Kubernetes VerticalPodAutoscaler or build your own solution to dynamically adjust resource requests.

Use limits that are greater than resource requests

Another technique is to use limits that are greater than resource requests. This allows you to pack more pods onto each node, reducing your total node count and costs.

Here's how this works: The requested resources of a pod are allocated and are always available to that pod and that pod only. However, the pods scheduled to a node may use more resources than they requested (up to their limits). This means that if there is extra capacity on a node, then its pods may take advantage of it.

By setting conservative requests but higher limits, you can schedule more pods per node (based on requests) while still allowing them to burst when needed. The total limits of all the pods on a node may exceed the node's resources, and some of the pods might, at any given time, use more resources than they requested.