Dive Brief:
- Cloud costs spiked for Kubernetes workloads last year, according to a Cast AI analysis of 4,000 clusters running on AWS, Microsoft Azure and Google Cloud. Even the price of AWS spot instances, unused CPUs the hyperscaler offers at a steep discount, increased 23% in AWS’ six most popular U.S. regions between 2022 and 2023, the Wednesday report said.
- As the cost of running containerized workloads increased, companies massively overprovisioned, compounding the budget hit. Customers with clusters of 50 CPUs or more used only 13% of purchased compute, the analysis found.
- “Out of $100 you pay to AWS, $77 is a gift to Mr. Bezos — you pay for it, but you don’t use it,” Laurent Gil, Cast AI co-founder and CPO, told CIO Dive.
Dive Insight:
As organizations migrate more workloads, the pressure to keep a lid on cloud cost intensifies.
Enterprises prioritized optimization in 2023, leaning on providers to deliver better usage data and implementing FinOps practices to track and trim spending.
Kubernetes, the open source orchestration suite for lifting, shifting and then modernizing migrated applications, remains a major drain on cloud spend, according to Cast AI. Three primary culprits push spending upwards, the report said: overprovisioning, excessive headroom and underutilization of low-priced spot instances.
“About half your cloud costs come down either to compute you're paying for but have never used or provisioning a more expensive machine than you need when there were others available that were cheaper,” Gil said.
Kubernetes adoption is gaining momentum, according to Gartner. The analyst firm expects nearly all global enterprises to be running containerized workloads by 2029, up from just half last year. Containers will support one-third of all enterprise applications by that time, the firm forecast in a January report.
Cast AI’s analysis found efficiency varied little between the two largest cloud providers, AWS and Microsoft Azure. Both had CPU utilization rates of just 11%. Google Cloud was slightly better, but CPU requests still far exceeded application needs, with customers consuming only 17% of what was purchased through the third largest hyperscaler.
Modest overprovisioning has a purpose, Gil acknowledged. It can be difficult to predict the compute needed to handle surges in consumer app usage, particularly during promotional events, and innovation can suffer if developers lack sufficient resources.
But teams commonly requisition 30% or 40% more compute than they use, just to provide a safety net, according to Gil.
“In some extreme cases, for example a trading platform when the market opens, the amount of compute is so high that organizations will overprovision by 400% or more,” Gil said.
Wasted spend was highest among clusters of 50 or more CPUs but declined as the size increased. Massive clusters of 30,000 or more CPUs had a utilization rate of 44%, although they made up only 1% of analyzed workloads, and the rate climbed from 13% to 17% for 1,000-plus CPU clusters.