The real infrastructure challenge behind generative AI adoption -

Cloud cost optimization used to be a fairly calm discipline. You identified underused resources, rightsized a few instances, cleaned up forgotten storage buckets, and called it a day. It wasn’t always easy, but the problem was at least predictable.

Then Generative AI arrived – and the whole thing got a lot more interesting.

Aman Aggarwal, COO, CloudKeeper

The numbers tell the story

The scale of what’s happening to cloud infrastructure is hard to overstate. According to Synergy Research Group, global cloud infrastructure revenues hit $106.9 billion in Q3 2025 alone – a 28% year-on-year increase and the first time quarterly cloud spend ever crossed the $100 billion mark. And the fuel behind this surge? Gen AI. GPU-as-a-Service revenues alone grew more than 200% year-on-year in that same quarter.

On the hardware side, the numbers are even more striking. IDC’s Worldwide Quarterly AI Infrastructure Tracker found that organizations increased spending on compute and storage hardware for AI deployments by 166% year-on-year in Q2 2025, reaching $82 billion in a single quarter. By 2029, IDC forecasts the global AI infrastructure market will hit $758 billion. Gartner projects that Gen AI model spending is expected to grow 80.8% this year alone.

This is a huge step-change in how cloud infrastructure gets used, while most cost management practices haven’t caught up yet.

Why Gen AI breaks the old playbook

The challenge with Gen AI infrastructure isn’t just that it’s expensive. It’s that it’s expensive in ways that are hard to predict and harder to control with traditional approaches.

For years, cloud cost optimization was built around fairly well-behaved workloads. A web server has stable CPU needs. A database scales in line with transactions. You could model it, budget for it, and manage it with the FinOps tools that existed at the time.

Gen AI workloads do not behave like that. GPUs are significantly more expensive than standard compute. Unlike CPUs, GPUs remain costly even when utilisation is low. Agentic systems also run for longer periods, maintain context, and interact with several services at the same time. They do not stop between requests.

Most AI infrastructure today also runs in cloud environments. In fact, cloud and shared infrastructure account for about 84% of global AI infrastructure spending, with hyperscalers driving the majority of it.

For most enterprises, the question is no longer whether AI is increasing cloud costs. The real question is how those costs can be managed.

The paradox: The same technology is also the solution

Here is where the story takes a turn I find genuinely exciting – not as a talking point, but based on what we’re seeing in practice.

The same AI technology that is increasing infrastructure complexity is also becoming one of the most effective tools to control it. Agentic AI systems can continuously monitor cloud environments, analyse usage patterns, and identify anomalies before they grow into expensive problems.

Instead of analysing last month’s bill, these systems can observe workloads in real time and recommend or trigger corrective actions.

However, organisations that manage AI infrastructure costs effectively combine this capability with the right tooling and operational practices.

Observability and monitoring

GPU utilisation monitoring at the workload level, not just the instance level, using tools such as Grafana, AWS CloudWatch, or Google Cloud Monitoring, configured for accelerator metrics
Real-time spend dashboards that surface idle or underutilised capacity before it compounds into a large invoice
AI-enabled FinOps platforms that support natural language queries, allowing engineers, FinOps teams, and finance leaders to explore cloud cost insights through a single interface

Infrastructure and workload optimisation

Traffic-aware autoscaling on Kubernetes based on actual workload patterns rather than static provisioning. This approach can significantly reduce infrastructure usage for agentic workloads
Model routing that directs requests to the most appropriate model instead of always using the most powerful and most expensive one in the stack
Spot and preemptible instance strategies for bursty training workloads, while reserving committed capacity for stable inference demand

Financial governance

Continuous commitment optimisation using AI to evaluate whether reserved instance and savings plan coverage match evolving workload patterns
Workload tagging and ownership assignment before deployment so cloud costs can be traced to teams and business outcomes
Multi-cloud cost comparison across AWS, Azure, and Google Cloud to identify pricing advantages and improve negotiating leverage

What makes this different from earlier generations of cloud optimization tools is the move from advising to acting. An AI agent that flags an idle GPU cluster can also take action, shift workloads, or trigger a response with enough context to make the fix immediate. IDC forecasts that AI platforms spending will grow at a 48.5% CAGR through 2027, driven largely by the rise of Agentic AI systems and their ability to orchestrate complex infrastructure decisions autonomously.

What this means right now

The old model of cloud cost management – periodic reviews, manual rightsizing, static budgets – is not built for this environment. It’s not wrong, it’s just too slow.

The right response isn’t to pump the brakes on Gen AI and Agentic AI adoption. The right response is to turn those same capabilities inward, toward your own infrastructure. That means investing in real-time visibility into GPU utilization. It means building workload-aware autoscaling that understands how agentic systems actually behave. And it means treating cloud cost management as a continuous, AI-assisted practice rather than a quarterly exercise.

The organizations that pair Gen AI or Agentic AI with intelligent infrastructure management are the ones that will actually see returns from it. Those that don’t will find their cloud bills growing faster than their business outcomes.

The feedback loop – AI managing the cost of running AI – is where the next generation of cloud efficiency gets built. The companies that figure this out early will spend smarter, scale faster, and turn infrastructure discipline into a genuine competitive advantage.

-author Aman Aggarwal, COO, CloudKeeper

About The Author

Jeevika

See author's posts