AI “Thinking” Models Hit Cognitive Wall, Study Finds

BABL News Graphic – 2025-06-20T151738.841

Written by Jeremy Werner

Jeremy is an experienced journalist, skilled communicator, and constant learner with a passion for storytelling and a track record of crafting compelling narratives. He has a diverse background in broadcast journalism, AI, public relations, data science, and social media management.



Posted on 06/20/2025



In News

A new study by researchers at Apple challenges prevailing assumptions about the capabilities of frontier AI models designed for reasoning. The paper, “The Illusion of Thinking,” reveals that Large Reasoning Models (LRMs)—advanced variants of large language models (LLMs) trained to mimic structured “thinking”—fail to scale effectively when faced with complex tasks, despite their impressive performance on standard benchmarks.

Researchers Parshin Shojaee, Iman Mirzadeh, and colleagues found that while LRMs such as Claude 3.7 Sonnet Thinking, DeepSeek-R1, and OpenAI’s o3-mini show early advantages in medium-difficulty reasoning tasks, their performance collapses entirely as complexity increases.

The authors write that they observed a complete breakdown in reasoning accuracy once tasks surpass a certain threshold. Unlike traditional evaluations focused on math or coding problems, this study used controlled puzzle environments—like Tower of Hanoi and River Crossing—to precisely measure how reasoning scales.

The results identify three distinct “regimes” of reasoning: low-complexity tasks where standard LLMs outperform, medium-complexity tasks where LRMs show an edge, and high-complexity tasks where both models fail. More surprisingly, LRMs reduce their own reasoning effort (measured by token usage) as problem difficulty rises—despite having sufficient resources—suggesting an inherent scaling flaw.

In-depth analysis of the models’ reasoning traces revealed patterns of “overthinking,” where models explore incorrect paths long after finding a solution. For more complex tasks, LRMs often fail to find any valid solution at all. Even when explicitly provided with step-by-step algorithms, models like Claude 3.7 Sonnet still struggled to execute them reliably.

By exposing these cognitive limitations, the study signals the need for rethinking how AI reasoning is evaluated—and raises important questions about whether current models are truly equipped for the demands of real-world problem-solving.

Need Help?

If you’re concerned or have questions about how to navigate the global AI regulatory landscape, don’t hesitate to reach out to BABL AI. Their Audit Experts can offer valuable insight and ensure you’re informed and compliant.

What’s New?

Stay up to date with the latest updates.

New York Issues Guidance to Strengthen Oversight of Third-Party Cybersecurity Risks

Subscribe to our Newsletter

Keep up with the latest on BABL AI, AI Auditing and
AI Governance News by subscribing to our news letter

AI “Thinking” Models Hit Cognitive Wall, Study Finds

Written by Jeremy Werner

What’s New?

New York Issues Guidance to Strengthen Oversight of Third-Party Cybersecurity Risks

EU–Egypt Summit Puts AI, Digital Infrastructure, and Clean Tech at the Center of New Partnership

Australia Orders AI Chatbot Firms to Explain Child Safety Measures Amid Rising Concerns

Subscribe to our Newsletter