Leading AI companies like OpenAI, Anthropic, and Google are encountering significant challenges in developing their next-generation AI models. Progress has fallen short of expectations, with models like OpenAI's "Orion" and Anthropic's "Claude 3.5 Opus" failing to meet internal benchmarks, resulting in delayed releases.
Reports from The Information, Bloomberg, and Reuters revealed that while "Orion" surpasses OpenAI's current large language models (LLMs), its performance lacks the dramatic improvement seen between GPT-3.5 and GPT-4. It also shows weakness in specific tasks like programming. Similarly, Google's upcoming Gemini iteration and Anthropic's Claude 3.5 Opus have demonstrated only incremental improvements despite substantial investments in model scaling.
Originally planning to release Claude 3.5 Opus by late 2024, Anthropic has recently removed timeline references from its website. Within the company's model hierarchy--Opus, Sonnet, and Haiku--Opus represents the most powerful version, yet its latest iteration remains unreleased.
Scaling law loses momentum
For years, AI development has relied on the Scaling Law, proposed by OpenAI in 2020, which correlates performance improvements with increased model parameters, training data, and computational resources. However, recent trends indicate diminishing returns, as marginal performance gains no longer justify the enormous costs of scaling.
Ilya Sutskever, OpenAI co-founder and CEO of startup SSI, who once championed the Scaling Law, now acknowledges its benefits have plateaued. He noted, "The 2010s were the era of model scaling. Today, everyone is searching for the next miracle." Additionally, Hugging Face scientist Margaret Mitchell suggests that building versatile AI models may require fundamentally new approaches.
However, Microsoft CTO Kevin Scott maintains that the Scaling Law remains crucial for advancing LLMs. He anticipates a shift from model training to inference optimization, enabling broader applications and better cost efficiency.
New frontiers emerge
Anthropic CEO Dario Amodei views the Scaling Law as an empirical guideline rather than a universal rule, stating, "I'm betting on it to continue, but I'm not entirely sure." Bill Gates predicts the law's relevance will persist for several years but emphasizes advancing AI metacognition--human-like thinking and planning--as the next frontier. Gates anticipates breakthroughs in metacognition by 2025, enabling AI to "step back and think" for improved accuracy and decision-making.
These scaling limitations have driven companies to explore alternative strategies. OpenAI has established "Foundations Teams" to optimize inference stages, introducing techniques like "test-time compute." This enables LLMs to simulate multi-step reasoning in real-time, significantly improving performance in logic-driven tasks. OpenAI scientist Noam Brown notes that allowing AI 20 seconds to "think" during a task can match the performance gains from a 100,000-fold increase in model size or training time.
Furthermore, OpenAI is shifting focus toward AI agents as the next generative AI breakthrough. CEO Sam Altman recently announced that GPT-5 would not release in 2024, with a new AI agent codenamed "Operator" scheduled for early 2025. Operator is designed to handle complex tasks, such as writing code and trip planning, with application programming interfaces (APIs) to follow.
Quality trumps quantity in training data
As high-quality training data becomes scarce, AI companies are implementing innovative solutions. OpenAI has partnered with numerous publishers, including TIME Magazine and The Wall Street Journal, to license content for training. Other firms are utilizing synthetic data and specialized labeling to enhance model quality.
Anthropic's Amodei and Microsoft's Scott agree that the quality and diversity of training data now supersede sheer quantity. Companies are increasingly recruiting domain experts to refine data labeling and ensure precise, context-aware AI responses. Despite challenges, industry leaders remain optimistic about sustaining AI progress through breakthroughs.