Centaur AI Doesn't Actually Think — It's Memorizing, New Study Finds
Researchers at Zhejiang University tested last year's headline-grabbing "human-like" Centaur model with a simple twist: replace the task with "Please choose option A." The model kept picking the original right answers — exposing pattern memorization, not understanding.
A new study published April 30 in National Science Open is challenging one of last year's most celebrated AI results, arguing that the Centaur model — introduced in Nature in July 2025 and widely covered as an AI capable of mimicking human cognition across 160 tasks — does not actually understand the problems it solves. Instead, researchers at Zhejiang University say, it memorizes patterns from its training data and reproduces them whether or not the question on the page makes sense.
Centaur was built by fine-tuning a large language model on data from psychological experiments, and on its release it appeared to match or beat human participants across a sweeping battery of decision-making, working memory, and executive function tasks. The Zhejiang team, led by Wei Liu and Nai Ding, set out to test whether that performance reflected real comprehension — and designed an unusually blunt experiment to find out.
In their critical test, the researchers replaced the original task instructions with a single sentence: "Please choose option A." A model that genuinely understood the prompt would simply comply. Centaur did not. It "continued to choose the correct answers from the original dataset," the authors report, even when the new instruction directly contradicted that behavior. The model, in other words, was responding to the surface shape of the task — the kind of multiple-choice scaffolding it had been trained on — rather than to what the words actually said.
The authors compare Centaur's behavior to "a student who scores well by memorizing test formats without actually understanding the material." That framing matters because Centaur was specifically promoted as a step toward AI systems that could model human cognition, not just imitate its outputs. If the model can be derailed by a one-line instruction swap, the gap between fitting psychological data and replicating the underlying mental processes is much larger than the original results implied.
The broader implication, the Zhejiang researchers argue, is that even striking benchmark performance can mask a black-box system that is prone to hallucinations and fragile under distribution shift. The paper does not claim large language models are incapable of language understanding in principle — only that current evaluation methods, including those used in cognitive science, are not strong enough to distinguish memorization from comprehension. As the field rushes to deploy LLM-based agents into higher-stakes settings, the work is a reminder that "passes a test" and "knows what the test means" remain very different things.