Apple's team argues LLMs rely on pattern matching, not true formal reasoning or understanding.
Apple researchers have published a study detailing key limitations in LLMs, or large language models, from major AI labs like OpenAI.
The study, worked on by scientists from the tech giant and published this month, reveals a new benchmark used to evaluate LLMs' mathematical reasoning skills. That benchmark has highlighted limitations in some of the world's top LLMs, including OpenAI's 4o and o1 models.
Specifically, the paper found that changing the wording of questions or adding unrelated phrases could drastically change the results. In some cases, accuracy dropped by up to 65%. The more complex the questions, the wider the range of results achieved, lowering the accuracy.
As such, the team at Apple, made up of Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar, have concluded that their research shows "no evidence of formal reasoning" in the models tested. They argue that the behavior is more likely sophisticated pattern matching than mathematical processes.
What are LLMs?
LLMs are what underpins a lot of AI-powered tools nowadays. They are a type of AI that uses machine learning to understand and generate human language, making them useful for text analysis, understanding text prompts, and so on. LLMs are typically trained on large amounts of data, such as books and articles, to learn how language works.
However, this research appears to support the theory that LLMs cannot truly reason yet. That leaves some doubt about whether they can be trusted with more complex tasks, at a time when many companies are using AI for increasingly important roles.
For example, OpenAI's CEO Sam Altman outlined plans for AI to feed into virtually every aspect of life, from healthcare and education to home assistants and workplace aids. Concerns around whether superintelligent AI is really as close as Altman claims have already been raised by other tech leaders, including Meta's AI Chief Yann LeCun, who labeled such hopes "complete B.S."