AI InfrastructureApril 22, 202510 min read

Cost Attribution for AI Agents: Why Tool-Call-Level FinOps Is the Missing Layer

A thought leadership piece on why AI cost attribution needs to happen at the protocol level, not as an afterthought. References MCP, Kagenti, and the emerging need for agent-aware FinOps.

FinOpsOpenCostPrometheusMCPKubernetes

The Cost Visibility Gap in AI Infrastructure

Cloud FinOps solved the cost attribution problem for traditional infrastructure. Tools like OpenCost can tell you exactly how much a Kubernetes namespace costs per hour. AWS Cost Explorer breaks down spending by service, tag, and account. We have mature frameworks for attributing compute, storage, and network costs to teams and products.

AI agents break all of this.

When an AI agent makes a tool call through MCP (Model Context Protocol) that triggers a chain of LLM inference, vector database queries, and API calls across multiple services, the cost of that single user interaction is invisible. It is spread across inference endpoints, embedding services, retrieval systems, and orchestration layers with no unified attribution.

This is not a theoretical problem. Organizations deploying AI agents today are discovering that their AI infrastructure costs are growing faster than their ability to understand them.

Why Existing FinOps Tools Fall Short

Traditional FinOps operates at the infrastructure layer: how much compute does this pod use, how much storage does this PVC consume, how many API calls does this service make. This works when the relationship between infrastructure usage and business value is relatively direct.

AI agents introduce a new abstraction layer that breaks this relationship:

Multi-hop cost chains: A single agent action might involve an LLM call ($0.03), a vector search ($0.001), a web search API call ($0.005), another LLM call to synthesize results ($0.02), and a tool execution. The total cost of that action is the sum of all hops, but no existing tool tracks it end-to-end.

Non-deterministic resource usage: The same prompt can generate wildly different costs depending on output token count, tool call decisions, and retry behavior. Traditional capacity planning assumes relatively predictable resource consumption patterns.

Nested agent delegation: With frameworks like Kagenti and multi-agent systems, one agent can delegate work to sub-agents, each consuming their own inference and tool-call budgets. The cost tree can be arbitrarily deep.

Protocol-level routing decisions: When an inference gateway like Kortex routes a request from GPT-4 to Claude based on cost optimization rules, the actual cost depends on runtime routing decisions that are invisible to infrastructure-level monitoring.

What Tool-Call-Level Attribution Looks Like

The right answer is attribution at the protocol level, not the infrastructure level. Every tool call, every LLM invocation, every retrieval query should carry cost metadata that propagates through the call chain.

Here is what this looks like concretely:

Request-scoped cost tracking: When an agent receives a user request, it creates a cost context that follows every downstream call. Each LLM invocation, tool call, and sub-agent delegation appends its cost to this context. When the request completes, the total cost is the sum of the entire tree.

MCP-aware cost headers: The Model Context Protocol defines how agents interact with tools. Cost attribution should be a first-class concept in this protocol. Every tool call response should include a cost field, and every MCP server should report the cost of fulfilling a request.

Team and feature attribution: Cost contexts should carry attribution metadata (team ID, feature ID, user segment) that enables chargeback at the business level. This is the same pattern that works for traditional cloud costs, extended to the agent layer.

Budget enforcement at the agent level: Rather than setting infrastructure-level quotas (which are too coarse), budgets should be enforced per agent, per user, or per conversation. An agent that has exhausted its budget should gracefully degrade rather than silently running up costs.

The Architecture I Am Building Toward

My work on Kortex (inference gateway) and the AI FinOps Platform converges on this problem. Here is how the pieces fit together:

Kortex sits at the inference layer, tracking per-request costs with team and feature attribution. It knows the cost of every LLM call because it mediates between applications and inference backends. This is the data plane for cost tracking.

The AI FinOps Platform aggregates cost data from Kortex, OpenCost (for infrastructure-level costs), and NVIDIA DCGM (for GPU utilization). It provides the analytics layer: cost-per-inference, cost-per-team, anomaly detection, and budget forecasting.

The missing piece is the agent orchestration layer. When an agent framework like Kagenti or LangGraph orchestrates a multi-step workflow, it needs to propagate cost context through every step and report the total cost of the workflow back to the FinOps platform.

This is not a single tool problem. It is an ecosystem problem that requires cost awareness at every layer of the stack.

Why This Should Be Protocol-Level

Some will argue that cost tracking can be added as an observability concern, bolted on via OpenTelemetry spans or custom middleware. I disagree, for three reasons:

Accuracy requires provider participation: The cost of an LLM call depends on input tokens, output tokens, and model-specific pricing. Only the provider (or a gateway that intercepts the response) can calculate this accurately. Estimating costs from the outside based on request size is unreliable.

Budget enforcement requires synchronous awareness: If cost tracking is asynchronous (collect data, analyze later), you cannot enforce budgets in real-time. An agent needs to know its remaining budget before deciding whether to make another expensive tool call.

Multi-agent systems need cost propagation: When Agent A delegates to Agent B, which delegates to Agent C, the cost of B and C should roll up to A's budget. This requires a cost context that is part of the agent communication protocol, not a sidecar concern.

Practical Steps for Today

While the ecosystem matures, here is what you can do now:

Instrument your inference gateway: If you are running an inference proxy (Kortex, LiteLLM, or custom), add per-request cost tracking with attribution headers. This gives you the data plane.

Add cost metadata to your agent framework: Whatever agent framework you use, add a cost accumulator to the agent's execution context. Log the total cost of each agent invocation alongside the result.

Build cost dashboards that show agent-level spending: Not just infrastructure costs, but cost-per-agent-action, cost-per-user-session, and cost-per-feature. This is what drives optimization decisions.

Set budget alerts, not just spending alerts: An alert that fires when total AI spend exceeds $10K/month is too late. Alert when a single agent conversation exceeds $5, or when a team's daily agent spend is trending 2x above baseline.

The Bet

I am betting that cost attribution will become a first-class concern in AI agent protocols within the next 18 months. The MCP specification will likely add cost reporting. Agent frameworks will add budget-aware execution. Inference gateways will standardize cost headers.

The organizations that instrument their AI infrastructure for cost visibility now will have a significant advantage when this happens. They will have the data, the dashboards, and the organizational muscle to optimize AI spending as it scales.

The organizations that treat AI costs as an undifferentiated cloud bill will find themselves in the same position that many companies were in 2015 with cloud spending: growing fast, with no visibility into what is driving the growth.

This post reflects my work building Kortex and the AI FinOps Platform. If you are tackling similar problems, I would like to compare notes on approaches to agent-level cost tracking.