A Small AI Breakthrough: New Model Outperforms GPT-5 on Reasoning with Just 27M Parameters
Scientists at the Singapore-based company Sapient have unveiled a novel approach to artificial intelligence that operates in a fundamentally different way from conventional large language models, already surpassing its competitors in key benchmark tests. Their system, called HRM — the Hierarchical Reasoning Model, is inspired by the architecture of the human brain, where different regions process information on varying timescales — from milliseconds to minutes — gradually weaving fleeting impulses and extended processes into a coherent whole.
Unlike mainstream models such as GPT-5, Anthropic Claude, or DeepSeek, the new algorithm is both far more compact and markedly more efficient. According to the study, HRM contains just 27 million parameters and was trained on a mere 1,000 examples. By comparison, today’s leading language systems contain billions or even trillions of parameters — GPT-5, for instance, is estimated at between three and five trillion, several orders of magnitude larger.
The model was tested on ARC-AGI, one of the most challenging benchmarks designed to assess how closely algorithms approach the level of general artificial intelligence. In its first run, HRM scored 40.3%, compared to OpenAI’s o3-mini-high at 34.5%, Anthropic Claude 3.7 at 21.2%, and DeepSeek R1 at 15.8%. On the more demanding ARC-AGI-2, Sapient’s system again led the pack: 5%, versus 3% for OpenAI, 1.3% for DeepSeek, and less than 1% for Claude.
Most current language models rely on the so-called “chain-of-thought” reasoning method, where a complex problem is broken down into smaller subtasks expressed as intermediate steps in natural language. While this mimics the way humans often reason in fragments, it demands enormous datasets, produces unstable results, and frequently operates too slowly.
HRM, by contrast, employs a different principle: it conducts reasoning sequentially without explicitly articulating intermediate steps. It is structured into two modules — a high-level planner for slower, abstract strategy, and a low-level processor for rapid, detailed computations — a division reminiscent of specialized functions within the human brain.
The system also leverages iterative refinement: an initial answer is repeatedly revised in short bursts of “micro-deliberations.” After each cycle, the model decides whether to continue reasoning or deliver a final output. This design enabled HRM to succeed where traditional LLMs have failed altogether. For instance, it achieved near-perfect accuracy in solving sudoku puzzles and efficiently computed optimal paths through complex mazes.
Sapient has released the model’s code on GitHub, and ARC-AGI organizers independently confirmed the reported results. Interestingly, they noted that the architecture itself contributed only modestly to the performance gains; the real leap came from the refinement process during training, a detail the original paper had underemphasized.
Although the study has not yet undergone peer review, the community is already abuzz: HRM suggests that smaller, relatively simple models can outperform massive LLMs when designed to reflect the principles of human cognition.