Google Titans & MIRAS: New Architecture for 2M+ Token Long-Term AI Memory

by Nam Phong · December 9, 2025

Google has unveiled a new architecture for processing long sequences, Titans, along with a theoretical framework, MIRAS, which unites the speed of recurrent networks with the precision of transformers. According to the company, this approach enables models to retain essential details across extraordinarily long contexts — from sprawling documents to genomic data.

Classical transformers revolutionized the field through the attention mechanism, which allows a model to “look back” and isolate the most meaningful portions of text. Yet this capability comes at a cost: the computational complexity of attention grows quadratically with sequence length, making it prohibitively expensive to scale transformers to contexts of millions of tokens.

The research community has already experimented with circumventing this limitation through faster, linear-time architectures — such as efficient RNNs and state space models like Mamba-2. These systems compress past context into a fixed-size vector, enabling linear scaling. But this memory constraint makes them struggle with truly rich, extended sequences where subtle relationships between facts must be preserved.

In two new papers, Google proposes the combined use of Titans and MIRAS. Titans is a concrete architecture; MIRAS is a unifying theoretical framework that interprets such systems as a form of associative memory. Together they advance the idea of “inference-time learning”: a model can refine itself on the fly, not only retrieving information from parameters but also updating its long-term memory as new data arrives.

The central idea of Titans is a memory hierarchy reminiscent of human short-term and long-term memory. Short-term memory is still handled by attention, which excels at local contexts. But long-term memory is implemented not as a single fixed vector or matrix, but as a full-fledged deep neural network — a multilayer perceptron. This allows the model to encode past context far more richly and to “understand” narrative structure rather than merely store scattered notes.

Another defining concept is the “surprise” mechanism. Humans readily forget routine details but vividly recall events that break expectations. Titans operates on the same principle: the model compares the current memory state with the new input and measures how “unexpected” it is. If the discrepancy is small, there is no need to store it long-term. If the input sharply diverges from expectations, it signals an important or anomalous fact worth preserving.

To prevent the system from reacting only to isolated spikes, Titans incorporates two stabilizing components. First, momentum: the model considers not just the current surprise but also recent history, allowing it to capture chains of related events. Second, adaptive weight decay, which acts as a “release valve” preventing memory overflow during extremely long sequences by gradually phasing out obsolete information.

MIRAS describes this entire class of models at a higher level of abstraction. Within its framework, any sequence architecture is viewed as an associative memory learning to map keys to values while balancing two opposing forces: absorbing new information without destroying what was learned before. MIRAS identifies four core components: the structure of the memory itself, the mechanism for choosing what to attend to, the retention-and-forgetting process, and the optimization rule that updates memory.

Google places particular emphasis on MIRAS’s departure from conventional Euclidean metrics such as mean squared error or dot-product similarity. In most modern systems, these metrics underpin both model preferences and memory regularization. They are convenient but sensitive to outliers and restrict the space of viable solutions. MIRAS opens the door to architectures governed by non-Euclidean loss functions and more diverse regularizers drawn from optimization theory and statistics.

Within this framework, Google researchers developed three explicit attention-free models: YAAD, MONETA, and MEMORA.

YAAD uses a gentler Huber loss for errors and performs better on noisy data with typos or sporadic artifacts.
MONETA experiments with advanced norms, probing whether stricter mathematical “laws” for remembering and forgetting can improve long-term memory stability.
MEMORA attempts to stabilize memory to the utmost degree by forcing it to behave like a probability distribution, ensuring each update is controlled and precise.

In experiments, Titans and MIRAS-based models were evaluated against contemporary architectures such as Transformer++, Mamba-2, and Gated DeltaNet. Testing covered standard language-modeling datasets (C4, WikiText) and zero-shot commonsense benchmarks including HellaSwag and PIQA. According to Google, the new models achieve lower perplexity and higher accuracy than similarly sized baselines, while preserving linear scalability and parallel-training capability.

The advantages of Titans are most striking in tasks requiring extreme context lengths. On the BABILong benchmark, which demands reasoning across facts scattered throughout very long documents, Titans surpasses all baseline models — even heavyweight systems like GPT-4 — despite having far fewer parameters. The authors also report that Titans can scale context windows beyond two million tokens while maintaining performance.

Additional studies demonstrated that the depth of the memory module is crucial. With identical memory “size” but varying depth, deeper variants consistently achieved lower perplexity and scaled more gracefully as sequence length increased — further evidence that a rich, deep long-term memory genuinely helps models navigate vast contexts.

Ultimately, Google positions Titans and MIRAS as a step toward a new generation of models that learn “in the moment,” preserve essential details across immense spans of context, and remain efficient enough for practical deployment. MIRAS unites online optimization, associative memory, and architectural design, while Titans shows how the speed of RNNs can be harmonized with the expressive power of transformers in the era of long-context AI.

Support Our Threat Intelligence

If you find our technology report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal