Many Golems, One Potter: When AI Systems Interact With Each Other
The Maharal had one golem. He shaped it from clay, inscribed the word on its forehead, and directed it personally. One creator, one creation, one chain of command. The relationship was simple enough that a single person could manage it.
We don't live in that world anymore. The systems we're building increasingly involve not one golem but many, and the most interesting developments aren't about making any single golem more powerful. They're about what happens when golems work together.
The Workshop, Not the Sculpture
For most of the history of AI, the focus has been on the individual model. Make it bigger. Make it faster. Make it more accurate. This is the equivalent of the Maharal trying to build a better golem: stronger clay, more precise inscription, a more capable single creation.
The shift happening now is from sculpting a single golem to designing a workshop where many golems collaborate. Multi-agent systems, where multiple AI models interact with each other to accomplish tasks, are producing results that no single model achieves alone.
The pattern is straightforward. One agent generates a draft. Another agent reviews it and suggests improvements. A third agent checks the result against constraints. The agents iterate, each one contributing a different capability, until the output converges on something better than any of them would produce independently. It's not unlike a team of specialists, except the specialists are tireless, fast, and available at the cost of compute rather than salaries.
Software development is one of the clearest examples. Multi-agent coding systems now exist where one agent writes code based on a specification, another agent writes tests for that code, and a third agent runs the tests and reports failures back to the first agent for correction.[1] The loop continues until the tests pass. Each agent is a golem: it follows its instructions literally, without understanding the broader purpose. But the interaction between them produces something closer to what a thoughtful developer would produce than any single agent manages alone.
Ensemble Intelligence
The idea that combining multiple imperfect judgments produces a better result than any single judgment has deep roots. Francis Galton observed in 1907 that the median guess of a crowd estimating the weight of an ox was remarkably close to the actual weight, even though most individual guesses were wrong.[2]
Machine learning formalized this as ensemble methods. Random forests combine hundreds of decision trees, each trained on a slightly different subset of data, and aggregate their predictions. The individual trees are weak learners. The forest is strong. Gradient boosting builds models sequentially, each one correcting the errors of the previous one. The ensemble outperforms any of its components.
This is many golems, each limited, producing collective intelligence that exceeds any individual. No single tree in the random forest understands the data. The forest, as a system, makes better predictions than any tree alone. The understanding is still absent, but the capability is real.
The same principle is now being applied at a higher level. Instead of combining simple statistical models, researchers are combining large language models with different specializations. One model might be strong at reasoning, another at retrieval, another at code generation. Routing queries to the right specialist, or having multiple specialists contribute to a single response, produces better results than a single generalist model.[3]
Golems That Check Each Other's Work
One of the most promising applications of multi-agent systems is mutual verification. A single golem has no way to know if it's wrong. It produces output with the same confidence whether the output is correct or nonsensical. But two golems can check each other.
Constitutional AI, developed by Anthropic, uses this principle. One model generates a response. A second model evaluates whether the response meets a set of principles. The first model revises based on the feedback. The process iterates.[4] Neither model understands the principles in any deep sense. But the interaction between them produces outputs that are more aligned with the principles than either model would produce alone.
The pattern extends beyond safety. In scientific research, AI systems are beginning to operate in loops where one agent generates hypotheses, another designs experiments to test them, and a third analyzes the results. DeepMind's AlphaFold predicted protein structures that experimental biologists then confirmed, and the interaction between computational prediction and experimental validation accelerated the pace of discovery dramatically.[5]
Drug discovery pipelines are adopting similar architectures. One model screens millions of molecular candidates. Another predicts toxicity. Another models how the molecule will interact with biological targets. The pipeline moves faster than any human team could manage, not because any single model is brilliant, but because the models are orchestrated to complement each other's strengths and compensate for each other's weaknesses.
The Potter's New Craft
If the golem story is about the relationship between creator and creation, the many-golems story is about a new kind of creation: designing the relationships between creations.
The Maharal's craft was shaping clay. The modern equivalent is increasingly about orchestration: defining how agents communicate, what each agent is responsible for, how conflicts between agents are resolved, and when a human needs to intervene. The potter isn't shaping a single figure anymore. The potter is designing the workshop.
This is a genuine shift in what it means to build with AI. Writing a prompt for a single model is one skill. Designing a system where multiple agents collaborate, check each other, and produce emergent capabilities is a different skill entirely. It's closer to organizational design than to programming. You're defining roles, responsibilities, communication protocols, and escalation paths, the same things a manager defines for a human team, except the team members are golems.
The Model Context Protocol (MCP) and similar frameworks reflect this shift. They provide standardized ways for AI agents to discover and use tools, access data sources, and interact with other agents.[6] The infrastructure for golem-to-golem collaboration is being built right now, and it's enabling architectures that would have been impractical even a year ago.
The Risks Worth Naming
The opportunities are real, but so are the failure modes, and they're worth naming honestly.
When golems interact primarily with other golems rather than with humans, feedback loops can form. Models trained on the output of other models can drift in ways that are hard to detect, a phenomenon researchers call model collapse.[7] The golems reinforce each other's patterns, including their errors, and the human who should be checking the work is several layers removed from the output.
The orchestration layer, where humans define how the golems collaborate, becomes the critical point of leverage. Get the orchestration right and many golems produce something remarkable. Get it wrong and they amplify each other's failures. The Maharal's direct relationship with his golem provided a natural check. When the potter is designing a workshop of dozens of agents, the check is less direct and requires more deliberate design.
The practical wisdom is the same as it's been throughout this series: the golem doesn't understand what it's doing. Many golems don't collectively understand what they're doing either. The understanding has to come from the humans who design, orchestrate, and oversee the system. The potter's role doesn't diminish as the workshop grows. It becomes more important.
The Workshop Is Open
The Maharal worked alone in a room, shaping one figure from clay. We're building workshops where many figures collaborate on tasks their creator couldn't accomplish alone. The golem is still a golem: powerful, obedient, and without understanding. But many golems, well-orchestrated, are producing capabilities that genuinely extend what's possible.
The craft is evolving. The question is no longer just "how do I build a better golem?" It's "how do I design a workshop where many golems, each limited, produce something greater than any of them could achieve alone?" The potter's hands are still on the clay. They're just shaping something more complex than a single figure.
References
[1] Qian, Chen et al., "Communicative Agents for Software Development," arXiv preprint, July 2023. https://arxiv.org/abs/2307.07924
[2] Francis Galton, "Vox Populi," Nature, Vol. 75, March 1907, pp. 450–451. https://doi.org/10.1038/075450a0
[3] Jiang, Dongfu et al., "LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion," Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023. https://arxiv.org/abs/2306.02561
[4] Yuntao Bai et al., "Constitutional AI: Harmlessness from AI Feedback," arXiv preprint, December 2022. https://arxiv.org/abs/2212.08073
[5] John Jumper et al., "Highly accurate protein structure prediction with AlphaFold," Nature, Vol. 596, August 2021, pp. 583–589. https://doi.org/10.1038/s41586-021-03819-2
[6] Anthropic, "Model Context Protocol," 2024. https://modelcontextprotocol.io/
[7] Ilia Shumailov et al., "The Curse of Recursion: Training on Generated Data Makes Models Forget," arXiv preprint, May 2023. https://arxiv.org/abs/2305.17493