Nature Study: Human Scientists Still Outperform Top AI Agents on Complex Research Tasks
Research·2 min read·Nature

Nature Study: Human Scientists Still Outperform Top AI Agents on Complex Research Tasks

A major new study published in Nature finds that the best AI agents complete complex scientific tasks at roughly half the rate of expert human scientists, challenging optimistic narratives about near-term autonomous AI research.

Share:

A landmark study published in Nature on April 13, 2026 has delivered a sobering reality check on the capabilities of today's most advanced AI agents: when pitted against human PhD scientists on complex, multi-step research tasks, the best AI agents perform at roughly half the success rate of their human counterparts. The findings, highlighted in the Stanford HAI 2026 AI Index Report released the same week, are expected to reshape expectations about the pace of autonomous AI-driven scientific discovery.

The study evaluated state-of-the-art agentic systems — including tool-using LLM agents capable of running code, searching literature, and designing experiments — on a benchmark of complex scientific workflows drawn from biology, chemistry, and materials science. Human experts with PhDs in the relevant fields served as the comparison group. The gap was consistent across domains: AI agents succeeded on tasks at roughly 50% the rate of human scientists, with performance dropping sharply as task complexity and the number of required reasoning steps increased.

The researchers identified several recurring failure modes. AI agents tended to struggle with tasks requiring genuine experimental judgment, such as knowing when an unexpected result warrants a pivot versus being an artifact of error. They also showed poor calibration on confidence — often expressing high certainty on incorrect conclusions. Long-horizon planning, where a scientist must hold a multi-week experimental agenda in mind, remained a major weakness, as agents lost coherence across extended task sequences even with large context windows.

The findings arrive at a moment of peak enthusiasm for AI-accelerated science, with major funding agencies and pharmaceutical companies betting heavily on autonomous lab agents. Study co-authors are careful to note that AI tools still provide significant value as assistants — boosting individual scientist productivity, surfacing relevant literature, and automating routine assays — but the vision of fully autonomous AI scientists replacing human-led inquiry remains a distant prospect. The paper calls on institutions, funders, and publishers to adapt their frameworks for evaluating and crediting human-AI collaborative research rather than assuming AI autonomy as a near-term baseline.

Related Articles