Explaining LLMs - The First Step

Type 1 Error - Oversimplification

Apr 06, 2025

Occam's Razor and Hasty Generalisation

Occam's razor is my favourite scientific principle, I mentioned it before in one of earlier blog posts on History of AI in the context of the work of Ray Solomonoff. Well, this time I’ll try to apply it to the contemporary discussions trying to define generative AI.

Think of Occam's razor as a skilled sculptor working with marble. The artist removes everything unnecessary, carving away excess stone to reveal the true form within. But stop too soon, chisel away too much, and you've destroyed the very essence of what you were creating. The sculptor seeks the simplest form that still captures the subject's soul - not the crudest shape that vaguely resembles it. In science, we do the same: strip away unnecessary complications while ensuring our explanation still holds the essence of truth. A theory too simple becomes a caricature; one too complex obscures with needless detail. The art lies in finding that perfect balance where nothing more can be removed without losing something essential.

This delicate balance becomes our guiding light when exploring the true nature of generative artificial intelligence. We must resist both the temptation to add unnecessary complexity and the urge to reduce these systems to something simpler than they truly are.

The Parrot vs. The Organism - Applying the Razor to GenAI and developing deeper understanding

Right, we talked about the Razor – don't make things more complicated than they need to be, but don't slice away the important bits either. Let's apply that to these AI models everyone's talking about, the ones that write stories or answer emails. What's actually going on under the hood?

One idea you hear a lot is that they're like "stochastic parrots." Fancy words, but the basic idea is simple: they've seen so much text (books, websites, everything!) that they're incredibly good at figuring out which word usually comes next. You give it "Roses are red, violets are..." and it knows "blue" is a super likely next word. It's like a super-powered autocomplete, mimicking patterns without understanding what roses or blue are. [1] This picture is appealing because it fits Occam's Razor in one way – it avoids making grand claims about the AI "thinking" or "feeling." It warns us, rightly, not to be fooled by smooth talk. If it just mimics patterns, that's much simpler than inventing some kind of conscious thought.

But is that all there is to it? Is "parrot" slicing away too much?

Then you have computer scientists who are actually cracking these things open and looking inside, like apprentice mechanics figuring out a new kind of engine. What they're finding isn't just random connections. They see specific parts – "circuits" – that seem dedicated to particular jobs. One circuit helps figure out grammar, another might handle adding numbers (even when the problem isn't written like 2+2=), another helps plan out the rhymes in a poem before it even writes the line! They even see the AI figuring things out in steps, like identifying "Texas" when asked about the capital of the state Dallas is in, before it gets to "Austin." [2]

For instance, when researchers at Anthropic asked one of these models to write a rhyming couplet starting with "He saw a carrot and had to grab it," they discovered something remarkable by looking at its internal workings. Before the model even began writing the second line, it was already activating internal features representing words that could rhyme with "grab it" – specifically words like "rabbit" and "habit." The model wasn't just picking the most likely next word over and over; it was planning ahead for how the entire line should end! Even more fascinating, when researchers deliberately modified these internal "planning" features – for example, injecting the concept "green" instead – the model completely restructured its response to write a different line ending with that word. This is evidence of deliberate multi-word planning. [3]

So, this isn't just simple parroting. There's complex machinery inside. It’s processing information in stages, using specialised parts, and even doing things that look a lot like planning. Does this mean they understand like we do? Almost certainly not! Understanding involves connecting things to the real world, having experiences, knowing why something is true. These models don't have that because among other things they do not have bodies. They're still fundamentally pattern-driven, built on the statistics of the data they swallowed.

Think of it maybe like a incredibly complex, self-programming player piano. It can play a beautiful, intricate piece of music (the output text). It has learned sophisticated rules about harmony and rhythm (the internal circuits and mechanisms). It can even "improvise" in a way that sounds good based on those rules. But does it feel the music? Does it understand the heartbreak or joy in the notes? No. It's executing a complex procedure it learned from data that does not contain these feeling because only humans brains have circutry for them.

So, Occam's Razor tells us this: Don't invent consciousness or true understanding where there's no evidence for it – the "parrot" view correctly warns us about that. But also, don't ignore the complex machinery and the step-by-step procedures we can see inside – the simple parrot idea doesn't capture that reality.

The simplest explanation that fits what we currently know is that these are incredibly complex pattern-matching machines that have developed internal structures and procedures for processing information and generating likely outputs. They're more than parrots, but they're definitely not human minds. They're something new, a kind of intricate computational mechanism we're just beginning to figure out. The key is to describe the mechanism we see, without adding magical thinking or ignoring the gears. And that task isn't getting any easier, because we have to take into account that the machines are changing every day, not just by swallowing new data but also by changing fundamentally through the efforts of scientists and tinkerers around the world, so however we define them today may be outdated tomorrow.

References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922
Lu, J., Yogatama, D., & Kim, Y. (2025). Tracing the thoughts of a large language model. Anthropic. https://www.anthropic.com/news/tracing-thoughts-language -model
Lindsey, J., Gurnee, W., Ameisen, E., Chen, B., Pearce, A., Turner, N. L., Citro, C., Abrahams, D., Carter, S., Hosmer, B., Marcus, J., Sklar, M., Templeton, A., Bricken, T., McDougall, C., Cunningham, H., Henighan, T., Jermyn, A., Jones, A., Persic, A., Qi, Z., Thompson, T. B., Zimmerman, S., Rivoire, K., Conerly, T., Olah, C., & Batson, J. (2025). On the biology of a large language model. Transformer Circuits Thread. https://transformer-circuits.pub/2025/attribution-graphs/biology.html

Additional readings

Ananthaswamy, A. (2024, January 22). New theory suggests chatbots can understand text. Quanta Magazine.

Dalvi, F., et al. (2019). What is one grain of sand in the desert? Analyzing individual neurons in deep NLP models. Association for Computational Linguistics.

Dorrier, J. (2023). Here’s why Google DeepMind’s Gemini algorithm could be next-level AI. SingularityHub.

Driess, D., et al. (2023). PaLM-E: An embodied multimodal language model. arXiv preprint arXiv:2303.03378.

Ferrando, J., Taufique, C. D., Gallego-Posada, J., Meǵıas, A., & Ortega, P. A. (2023). Auditing hidden objectives. arXiv preprint arXiv:2312.09390.

Jain, S., Geisler, M., Khudabukhsh, A. R., Tiwari, A., Mardziel, P., & Das, A. (2023). On the safety capabilities of instruction-tuned large language models. In Advances in Neural Information Processing Systems (pp. 15221-15271).

Jenner, E., Cohen, S., Nanda, N., & Ganguli, S. (2023). Emergent and predictable memorization in large language models. arXiv preprint arXiv:2304.11158.

Kantamneni, A., & Tegmark, M. (2023). Towards understanding how machines do arithmetic: Uncovering the algorithm in large language models. arXiv preprint arXiv:2310.01798.

Lan, M., Torr, P., & Barez, F. (2023). Towards interpretable sequence continuation: Analyzing shared circuits in large language models. arXiv preprint arXiv:2311.04131.

Lindsey, J., Gurnee, W., & Ameisen, E. (2024). On the biology of a large language model (Claude 3.5 Haiku case study). Transformer Circuits Thread, Anthropic.

Marks, D., Jermyn, A., Nanda, N., Jain, S., & Creswell, A. (2023). Auditing hidden objectives. arXiv preprint arXiv:2312.09390.

Nikankin, O., Ordentlich, Y., El-Yaniv, R., & Avidan, S. (2023). Bag of heuristics: Mechanistic analysis of arithmetic reasoning in LLMs. arXiv preprint arXiv:2310.20160.

OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774.

Power, A., et al. (2022). Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177.

Schick, T., et al. (2023). Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.

Schut, V., Huang, S., Ji, Z., Cohn-Gordon, R., Chen, X., Stoner, H., ... & Andreas, J. (2023). Multiple versions of the truth? How languages influence the truthfulness of large language models. arXiv preprint arXiv:2310.12168.

Stolfo, A., Jain, S., Nixon, N., Gnesdilow, S., Jiang, Y., Gao, L., ... & Barak, B. (2023). Understanding arithmetic reasoning in language models using causal mediation analysis. In Advances in Neural Information Processing Systems.

Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.arXiv

Wendler, M., Curtis, A., Bai, Y., Zhuang, Y., Ebert, C., Hutchinson, B., & Barocas, S. (2023). The multilingual linguistic infrastructure of large language models. arXiv preprint arXiv:2310.07596.

Wu, Z., Lu, J., & Kim, Y. (2024). Like human brains, large language models reason about diverse data in a general way. MIT News.

Zhong, W., Zheng, X., Deng, Z., & Ren, S. (2023). How do transformer language models navigate abstract mathematics? arXiv preprint arXiv:2312.05341.

Ante’s Substack

Discussion about this post