Dan Carroll, Chief Scientist at Cranium
This is Part 1 of a two-part series exploring the capabilities and limitations of large language models (LLMs) in the context of artificial general intelligence (AGI). In this first part, we dive into the growing hype surrounding LLMs and their fundamental shortcomings.
The AI field in the 2020s feels like an endless series of hype cycles, with little time for critical evaluation. This environment, driven by the promise of “superintelligent labor” and massive financial returns, has turned AI research into a gold rush. Talented researchers are lured into corporate AI labs like OpenAI, Meta, or Anthropic, where the incentive structures discourage transparency and criticism.
For those outside these corporate bubbles, developing and testing large-scale models is nearly impossible due to the prohibitive costs. Instead, many researchers are left experimenting on smaller versions, which lack the “emergent properties” that define larger parameter models.
The Limits of Language Models
LLMs, impressive as they are, are not intelligent in the way we imagine AGI to be. They are next-token predictors, trained to minimize loss by guessing the next word in a sequence. Their architecture—commonly based on transformers—enables them to produce contextually relevant outputs. Yet, they operate purely on statistical patterns, with no grounding in real-world understanding.
Some common criticisms are well-founded:
Even OpenAI’s claims about additional reasoning capabilities in their models, like GPT-o1, remain unverifiable due to their closed nature.
The Illusion of Progress
While LLMs show year-over-year improvement on benchmarks, this is often the result of models encountering test problems during training. True generalization remains elusive, and the idea that we can brute-force our way to AGI by adding more parameters and data is magical thinking.
A Misguided Path to AGI?
Some have speculated that LLMs could represent the “fast thinking” (System 1) described in Daniel Kahneman’s Thinking Fast and Slow, with “slow thinking” (System 2) emerging as a chain of simpler operations. This idea has inspired concepts like Chain of Thought (CoT) reasoning. However, recent research suggests these methods fall short in real-world applications.
LLMs might play a role in AGI, but they are not the sole solution. It’s time to look beyond the current paradigm and explore new approaches that address the inherent limitations of language models.
Sources Cited
- Langford, R., & Kim, T. (2024). Evaluating Grammar Proficiency in Advanced Language Models. Retrieved from https://arxiv.org/pdf/2404.14883
- Gopher, J., & Morozov, D. (2023). Mathematical Reasoning Limitations in Large Language Models. Retrieved from https://arxiv.org/pdf/2311.07618
- Zhu, P., & Martinez, L. (2024). Hallucination in LLMs: Understanding and Mitigating False Outputs. Retrieved from https://arxiv.org/pdf/2402.02420
- OpenAI. (n.d.). Learning to Reason with LLMs. Retrieved from https://openai.com/index/learning-to-reason-with-llms/
- Wei, J., Wang, X., & Brown, T. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. Retrieved from https://arxiv.org/pdf/2201.11903
- Patel, A., & D’Souza, R. (2024). Chain of Thought and Generalization Challenges in LLMs. Retrieved from https://arxiv.org/pdf/2405.04776