Skip to main content

According to a new Apple AI study, advanced models experience a complete collapse of accuracy when faced with complex problems. While the technology industry races to create increasingly more advanced systems, researchers of the study have found that existing AI models have fundamental limitations in their reasoning when it comes to high-complexity problems. These LRMs (large reasoning models) are meant to be the cutting-edge of artificial intelligence, solving problems by breaking down the problem into smaller steps using detailed thought processes. However, when testing the LRMs’ ability to solve puzzles, they found that they started to reduce their efforts at reasoning once they reached the limits of their performance. 

The Shocking Conclusion of the Apple AI Study

apple computer
Credit: Pixabay

While companies such as Sam Altman’s OpenAI, Anthropic, and Google have lately made waves in the public sphere, the new Apple AI study has raised concerns over how reliable these models are when it comes to complex reasoning. While these companies have increasingly asserted that their most advanced systems have the ability to reason, the researchers at Apple have dismissed this as an “illusion of thinking.” The study also revealed that these reasoning models wasted computing power. This is because they try to find the correct solution for simple problems early in their thinking process. 

Yet, as the problems grew in complexity, the models initially explored incorrect solutions before eventually arriving at the correct ones. Furthermore, when it came to very complex problems, the models entered total collapse and did not produce any correct answers. “Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty”, the researchers of the study noted. For the researchers, this was a clear indication that the existing reasoning models had a “fundamental scaling limitation in thinking capabilities.” 

Conclusion of the Study

chatgpt AI
Credit: Pixabay

The study showed that these AI models struggle with problem-solving beyond a specific complexity threshold and that this resulted from an “overthinking phenomenon.” The authors of the study have argued that the current approach to benchmarking regularly suffers from data contamination. In the paper, the researchers concluded that they “found that LRMs have limitations in exact computation.

They fail to use explicit algorithms and reason inconsistently across puzzles.” While there is currently much hype built around the AI industry, the team feels that these findings raise “crucial questions” about the real reasoning capabilities of these models. The study tested the LRMs’ reasoning abilities using puzzle challenges, such as solving the River Crossing and Tower of Hanoi puzzles. However, the team acknowledged that their focus on puzzles represents a limitation of the study. 

The Study has Received Mixed Reviews

Chatgpt on a laptop
Credit: Pixabay

While many within the industry have found the study’s conclusions concerning, there are critics who feel that this study may be somewhat self-serving. After all, Apple has faced plenty of criticism due to the fact that its own models have been outperformed by many of the same models it tested for its study. Was this study conducted simply to distract our attention from how much it was falling behind in the AI race? Some experts have stated that Apple should be focusing on improving its own models rather than putting others down. 

However, others feel that there is a valid discussion to be had and that much of the backlash exists because it challenges strongly-held beliefs surrounding reasoning models.  Yet, anyone who has played around with an LLM will know for themselves that they can often make factual mistakes or outright lie. While new models are becoming less prone to these hallucinations, they are still common enough to make anyone wary of over-relying on them. 

The Bottom Line

woman biting pencil while working on laptop
Credit: Pixabay

AI has proven to be very useful for many personal and work-oriented tasks. However, the new Apple AI study has revealed fundamental issues when it comes to dealing with very complex puzzles and problems. This is a significant issue, suggesting that our current AI models don’t have the capacity for adaptive reasoning and true abstraction, especially when the puzzles require multi-step logic that isn’t already in the training data. However, this should not be seen as an indication that things are slowing down. In fact, researchers are currently exploring the potential of hybrid models that combine neural networks and symbolic reasoning. 

For anyone who works with the tool daily, this study may serve as a much-needed reminder that these AI tools work best as a servant and not a master. They may work well as an assistant, but your input, discretion, and guidance are still required to ensure that you aren’t getting garbage output. While the industry may be rocketing forward at a rapid pace, it’s important to take a step back every now and again and make sure that you aren’t getting swept up in the hype. 

Read More: 9 Risks and Dangers of Artificial Intelligence