Generative AI has evolved from a simple chat tool to a complex system where token usage is now a key business metric. Agentic workflows, which run autonomously for hours and consume more tokens, are driving a shift from flat-rate pricing to usage-based models. This change reflects the growing complexity of AI applications and the need for more accurate cost measurement. Providers are moving away from one-size-fits-all pricing to models that reflect actual usage and performance. This shift is particularly evident in platforms like GitHub Copilot and Anthropic, which are adjusting their pricing strategies to match the demands of agentic workflows. The move is driven by the need to align pricing with the actual value created by AI, rather than just the number of tokens used. The token economy is also becoming more segmented, with different models and services tailored to specific use cases and performance requirements. This transformation is reshaping how AI is billed and valued in the market. The token economy is becoming more segmented, with different models and services tailored to specific use cases and performance requirements. This transformation is reshaping how AI is billed and valued in the market.
The most visible change is the overhaul of pricing models in response to growing usage. Starting June 1, 2026, GitHub Copilot is gradually moving to a usage-based model with 'GitHub AI Credits.' These credits are tied to actual token usage and the API prices of each model. They kick in wherever Copilot does more than just suggest code, mainly in chat, CLI, and agent features. Standard completions remain free in paid plans. Anthropic is also drawing a sharper line between normal use and agentic workflows. Claude Code, Claude Cowork, and Managed Agents turn Claude into a digital worker. Anthropic blamed bottlenecks at Claude Code on peak loads and contexts of up to one million tokens. The older plans fit heavy chat use but not always-on agent workflows. How sharply usage differs between fields shows up in Anthropic's own analysis of its public API: nearly half of all agentic tool calls go to software development, the area that first benefited from agentic models and scaffolding like Claude Code. Customer service, sales, finance, and e-commerce each sit at just a few percent. Simple chat requests still dominate there. That spread will likely widen as agentic workflows mature in office, research, finance, and legal tools.
The shift from flat-rate pricing to usage-based models is driven by the need to align costs with actual usage and performance. As agentic workflows become more common, the token price alone is misleading. The most obvious mistake in the new token economy is a flat price comparison. GPT-5.5 costs $30 per million output tokens, DeepSeek V4 Pro 87 cents. That says little about actual costs in use. Beyond price per token, what matters is consumption per task. Like with a car, the price of gas alone tells you nothing about what a drive from Berlin to Munich costs. You also have to know the distance and the mileage. A cheap model can get expensive if it needs more tries, fails more often, or requires more cleanup. A pricier model pays off when it gets to the goal with fewer loops and needs less human oversight. Benchmarks and other analyses make this clear. GPT-5.5, for instance, was supposed to offset part of its higher list price with shorter answers. An analysis of real-world usage by OpenRouter still showed cost increases of 49 to 92 percent over its predecessor, depending on input length. Of course, both can rise: the token price and the number of tokens consumed, as with Google's Gemini 3.5 Flash. Here, the token price jumped threefold over the predecessor Gemini 3 Flash. In Artificial Analysis's evaluation, the model also needed more steps in the Intelligence Index run. The result: in that test, it ended up more expensive than Google's current flagship, Gemini 3.1 Pro.
Source: thedecoder