The AI industry, long driven by the belief that larger models are more powerful, is facing a potential paradigm shift as cost pressures mount. Companies are increasingly looking at smaller, cheaper models as a viable alternative. This shift could significantly impact the economics of AI, particularly for major labs like OpenAI and Anthropic, which are preparing for their IPOs. According to Coinbase co-founder Brian Armstrong, demand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months. '20% of workloads will still run on latest gen models where IQ maxing is important,' Armstrong wrote on X. This prediction highlights a potential seismic change in the industry, challenging the traditional approach of prioritizing the most advanced models over cost efficiency.

Initial tests suggest that cheaper models can perform without sacrificing quality when the system is arranged correctly. Legal AI tool Harvey, in a test with Fireworks AI, reduced inference costs by 3x without affecting quality. The test combined Claude Opus and Fireworks’ GLM 5.1, using Opus for intensive tasks. 'Quality comes first, and in legal it always will,' said Harvey co-founder Gabe Pereyra. 'However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.' This trend underscores a broader shift in the industry, moving away from the dominance of large models toward smaller, more cost-effective alternatives.

The real divide in the AI industry is not between proprietary and open models, but between large and small ones. Users can save money by switching from GPT-5.5 to DeepSeek’s V4 Flash or GPT-5.4-mini. A price war is underway between in-house inference from big labs and independently served open-weight models. While the specifics of which small model wins may vary, the overall trend points toward a more cost-conscious approach. This shift challenges the scaling-first approach that has dominated the industry, as token prices rise and subsidies slow down. Users are now facing cost pressures for the first time, raising questions about how to justify the cost of training frontier models. The industry remains to be seen whether this cost pressure will drive enterprise users to smaller models or lead to alternative cost-saving measures.