J.Konstapel, Leiden, 28-5-2026.

Jump to the English translation here.
Huidige taalmodellen zijn indrukwekkend in het samenvatten van documenten of het genereren van code, maar vertonen een opvallende beperking:
ze kunnen niet goed terugkomen op informatie die langer dan een uur geleden is besproken.
Zodra een detail buiten hun directe aandachtsvenster valt, gedraagt het zich alsof het nooit heeft bestaan.
Dit is geen gebrek aan opslagcapaciteit, maar aan consolidatie: het omzetten van ruwe gegevens in bruikbare, samenhangende kennis.
Recent onderzoek van de Carnegie Mellon University en de University of Maryland (Lee et al., mei 2026) biedt hiervoor een architecturale oplossing: geef taalmodellen een expliciete slaapfase.
Gedurende deze fase krijgt het model geen nieuwe invoer, maar verwerkt het bestaande informatie herhaaldelijk offline.
Dit artikel vat de kern van die bevindingen samen en verklaart waarom dit inzicht verder gaat dan een technische verbetering: het raakt aan fundamentele principes uit de neurowetenschap, wiskunde en natuurkunde.
Het kernprobleem: perfect geheugen is niet genoeg
Modellen gebruiken een zogeheten key-value cache – een steeds groter wordende tabel met alles wat ze recent hebben gelezen.
Die is nauwkeurig maar rekentechnisch duur.
Daarom combineren hybride modellen deze cache met een compact, vast geheugen (fast weights).
Wanneer de cache vol raakt, worden oudere tokens verwijderd en wordt verwacht dat hun essentie in de fast weights is vastgelegd.
Het probleem is dat dit faalt bij taken die diepe sequentiële redenering vereisen, zoals het simuleren van een cellulair automaton over 32 stappen.
Het model heeft de informatie wel opgeslagen, maar niet op een manier die toegankelijk is voor redenering.
De bottleneck is niet de geheugencapaciteit, maar de beperkte rekentijd om de verwijderde context om te vormen tot een redeneerbare representatie.
De oplossing: slaap als offline verwerkingsfase
De voorgestelde slaapfase werkt als volgt: voordat tokens uit de cache worden verwijderd, doorloopt het model meerdere extra forward passes over de huidige context, zonder nieuwe externe input.
Elke pass werkt de fast weights bij.
Na de slaapfase wordt de cache gewist en gaat het model verder met verbeterde fast weights.
De resultaten zijn significant:
Op een complexe redeneertaak stijgt de nauwkeurigheid van vrijwel 0% naar circa 20% na twee slaappasses, en naar boven de 30% na vier passes.
Op een realistische wiskundige benchmark (GSM-Infinite) levert vier slaappasses een verbetering tot 47% op voor de moeilijkste problemen.
De verbetering komt zonder extra latentie tijdens de inferentie, omdat alle extra rekenkracht in de slaapfase plaatsvindt.
Waarom werkt dit? Vier theoretische kaders
De effectiviteit van slaap wordt ondersteund door onafhankelijke theoretische kaders:
Complementaire leersystemen (neurowetenschap): De hippocampus leert snel en episodisch (vergelijkbaar met de cache); de neocortex leert langzaam en structureel (vergelijkbaar met fast weights).
Slaap fungeert als de brug: de hippocampus herspeelt herinneringen, waarna de neocortex ze integreert.
Meer slaappasses geven betere consolidatie, precies zoals in het AI-model.
Hopfield-netwerken (wiskunde): Het aandachtsmechanisme in transformers is wiskundig gelijk aan één stap in een Hopfield-netwerk, een associatief geheugen.
Voor simpele taken is één stap genoeg, maar voor complexe patronen zijn meerdere stappen nodig om een stabiele energietoestand te bereiken.
Slaappasses zijn in feite iteratieve convergentie naar een diepere, coherente representatie.
Vrije-energieprincipe (theoretische neurowetenschap):
Dit principe stelt dat een systeem zijn interne model van de wereld blijft verbeteren door voorspellingsfouten te minimaliseren.
Slaap is offline optimalisatie: zonder zintuiglijke prikkels kan het model efficiënt de fouten verwerken die in de waakfase te snel ontstonden. Meer slaappasses reduceren de resterende vrije energie tot een optimum.
Resonant Stack (natuurkunde):
Dit alternatieve computerparadigma stelt dat informatieverwerking fundamenteel draait om coherentie in gekoppelde oscillatoren, niet om discrete instructies.
De discrete slaappasses van Lee et al. zijn een benadering van wat een Resonant Stack continu fysisch doet: langzame en snelle oscillaties in fase laten locken, zodat contexten stabiliseren tot attractoren.
Praktische implicaties voor organisaties
Voor bedrijven die AI-systemen bouwen of gebruiken, volgen drie directe consequenties:
Contextlengte is niet de heilige graal.
Investeren in consolidatie (slaap) levert voor complex redeneren mogelijk hoger rendement op dan investeren in steeds langere contextvensters.
Relevant voor o.a. analyse van meerdere documenten en langetermijnbeslissingen.
Latentie en redeneerdiepte kunnen worden ontkoppeld.
Dure berekeningen verhuizen naar de slaapfase; de uiteindelijke vraag wordt in één snelle pass beantwoord.
Dit is cruciaal voor productiesystemen waar responstijd telt.
Structurele garanties zijn waardevol.
De studie gebruikt een aangeleerde updateregel, maar een alternatief met een nilpotente constraint (geïnspireerd op fundamentele natuurkunde) zou theoretisch sneller en betrouwbaarder kunnen convergeren.
Voor hoogrisicodomeinen (medisch, financieel, juridisch) is het verschil tussen empirisch effectief en structureel gegarandeerd van groot belang.
Conclusie
De slaapbevinding van Lee et al. is geen geïsoleerde technische truc. Het is een empirische bevestiging, vanuit de reguliere machine learning, van een dieper architecturaal principe: voor diep redeneren over langetermijncontext is iteratieve offline consolidatie noodzakelijk.
Vier onafhankelijke theoretische kaders – neurowetenschappelijk, wiskundig, natuurkundig en informatie-theoretisch – voorspelden dit al.
Intelligentie is niet het kunnen terughalen van informatie, maar het kunnen consolideren ervan tot stabiele, redeneerbare kennis.
Biologische systemen doen dat tijdens de slaap.
Kunstmatige systemen kunnen het nu ook.
De vraag is of de AI-sector dit behandelt als een handige optimalisatie, of als een signaal dat een fundamenteel andere architectuur nodig is – één die is geworteld in coherentie in plaats van patroonherkenning.
English translation

Why AI Needs to Sleep
Memory, Coherence, and the Architecture of Deep Reasoning
J. Konstapel, Leiden, May 2026
There is something quietly embarrassing about the current generation of artificial intelligence. These systems can summarize a legal contract, generate executable code, and hold a sophisticated conversation — yet they cannot do what any reasonably attentive human does as a matter of course: think deeply about something that happened an hour ago. The moment a piece of information slips out of their immediate attention window, it might as well not exist. The AI reads, processes, and forgets — not from lack of storage, but from lack of the one thing that transforms raw memory into usable knowledge. It lacks the ability to consolidate.
A paper published in May 2026 by researchers at Carnegie Mellon University and the University of Maryland (Lee, McLeish, Goldstein & Fanti, arXiv:2605.26099) proposes a direct solution: give AI systems a sleep phase. Not metaphorically, but structurally — a period during which the model stops receiving new input and instead performs multiple passes of offline processing over what it has already seen, progressively reorganizing that content into a form that supports later reasoning. The results are striking. Systems equipped with this sleep mechanism significantly outperform those without it, particularly on tasks that require deep sequential reasoning over information the model can no longer directly attend to.
This essay argues that the sleep finding is more significant than it first appears. It is not an isolated engineering improvement. It is an independent empirical confirmation — arriving from the heart of mainstream machine learning — of a set of architectural principles that have been developed over the past decade under the heading of Right Brain AI and the Resonant Stack. Understanding why AI needs sleep turns out to illuminate why our entire approach to computing may need to change.
The Problem with Perfect Memory
To understand what sleep solves, one must first understand what breaks without it.
Modern large language models store context in what is called a key-value cache — a growing table of everything the model has read during a conversation or task. This mechanism is powerful: the model can attend to any earlier token with high fidelity, retrieving precisely the right piece of information at the right moment. But it is expensive. The memory required grows linearly with context length, and the computation required to search it grows quadratically. As tasks become longer and more complex, the cost becomes prohibitive.
The industry’s solution has been to combine two types of memory in a single architecture. Alongside the expensive but precise key-value cache, modern hybrid models maintain a compact fixed-size memory called fast weights — a compressed summary of everything the model has processed, encoded in the parameters of a state-space model (SSM) layer. When the context window fills up, older tokens are evicted from the precise cache and their information is expected to have been captured in the fast weights. The model then continues, nominally informed by everything it has seen even though most of it is no longer directly accessible.
This sounds elegant. In practice, it fails in a specific and revealing way.
Lee et al. demonstrate the failure with controlled precision. They construct tasks in which the amount of information to be stored is held strictly constant while only the depth of reasoning required over that information varies. For a simple task — retrieve the first bit of a stored binary string — hybrid models perform well. As the task becomes deeper — simulate a cellular automaton for thirty-two steps and predict the outcome — performance collapses to near-random guessing, despite the fact that the same amount of information was stored in the fast weights. The bottleneck is not memory capacity. It is the computation available for transforming evicted context into a representation capable of supporting later reasoning.
The fast weights contain the information. They simply have not been processed enough to make it accessible.
Sleep as Engineering Solution
The solution Lee et al. propose is elegant in its simplicity. Before evicting tokens from the attention cache, the model performs N additional forward passes over the current context window, each time updating the fast weights based on what it sees. This is the sleep phase. The model receives no new external input during this period — it simply processes what it already has, repeatedly, refining its internal representation with each pass.
After sleep, the cache is cleared and the model continues with updated fast weights. When the time comes to answer questions about the evicted content, the model draws on a fast-weight state that has been reorganized through multiple passes rather than formed in a single rushed encoding. The results are substantial. On the deep reasoning task that breaks standard hybrid models, two sleep passes raise accuracy from near-zero to approximately twenty percent; four passes push it above thirty. On a realistic mathematical reasoning benchmark (GSM-Infinite), four sleep passes improve accuracy on the hardest problems by up to forty-seven percent compared to no sleep.
Crucially, this improvement comes at no cost to inference-time latency. The extra computation happens during sleep, before the prediction phase begins. The model answers questions in a single forward pass, as before — it simply does so from a much better-prepared internal state.
This is an important engineering result. But it becomes a profound theoretical result when one asks: why does repeated offline processing help? The answer, it turns out, connects to some of the deepest ideas in neuroscience, physics, and the theory of computation.
Four Frameworks That Predicted This
Memory Requires Two Systems
The first framework is biological and has been understood for thirty years. In 1995, James McClelland, Bruce McNaughton, and Randall O’Reilly published what became one of the most cited papers in cognitive neuroscience: a theory of why the brain needs two complementary memory systems rather than one.
The hippocampus, they argued, is built for speed. It learns rapidly, stores individual episodes with high fidelity, and uses pattern-separated representations to minimize interference between similar memories. The neocortex, by contrast, learns slowly. It extracts statistical regularities across many experiences, building the kind of deep, generalizable knowledge that supports complex reasoning. The two systems are complementary because what makes one good at its job makes the other bad at the same job: fast learning creates interference; slow learning misses important details.
The bridge between them is sleep. During slow-wave sleep, the hippocampus replays recently acquired memories, providing a teaching signal to the neocortex. The neocortex gradually updates its weights to incorporate this new information while preserving existing knowledge. The process is not instantaneous — it requires multiple replay cycles, which is why a single night of good sleep improves memory consolidation more than equivalent waking time spent thinking about the same material.
The mapping onto Lee et al.’s architecture is structurally exact: the key-value cache is the hippocampus — fast, high-fidelity, episodic, and necessarily limited in size. The fast weights are the neocortex — compressed, persistent, and structured for generalization. Sleep passes are hippocampal replay. The finding that more sleep passes produce better performance on deeper tasks is the computational equivalent of the finding that more slow-wave sleep produces better consolidation of complex procedural memories.
The disanalogies are worth noting honestly. The biological hippocampus has rich episodic indexing and context-tagging that the key-value cache lacks. Biological replay is autonomous and unsupervised; Lee et al.’s sleep passes are trained end-to-end with gradient descent. The brain has a REM sleep phase for schema integration that has no equivalent in the current architecture. These gaps point directly toward the next generation of improvements — they are not flaws in the analogy but specifications for what comes next.
Attention as Incomplete Convergence
The second framework comes from mathematics. In 2021, Ramsauer and colleagues published a paper with the provocative title Hopfield Networks Is All You Need, establishing a rigorous equivalence that reshapes how one should think about what transformer attention actually does.
A Hopfield network is a type of associative memory first described by John Hopfield in 1982. It stores patterns as energy minima — stable configurations that the network settles into when presented with a partial or noisy version of a stored pattern. Retrieval is a process of energy minimization: starting from a query state, the network iteratively updates its configuration according to a fixed rule until it reaches the nearest energy minimum, which corresponds to the retrieved memory.
Ramsauer et al. proved that the softmax attention mechanism used in transformers is precisely the update rule of a modern Hopfield network with continuous states. A single forward pass through an attention layer is a single step of Hopfield energy minimization. For simple tasks with well-separated patterns, one step is sufficient for exact retrieval. For complex tasks with overlapping patterns and deep relational structure — exactly the tasks on which Lee et al. find their largest gains — one step leaves the system in a metastable state, a local energy minimum that is not the globally coherent configuration needed to support deep reasoning.
Sleep passes, viewed through this lens, are iterative Hopfield convergence. Each pass does not simply repeat the same computation: it updates the fast weights, which reshapes the energy landscape for the next pass. The system descends progressively toward deeper, more globally coherent energy minima — configurations in which the stored context is organized not merely for retrieval but for multi-step inference. The fact that gains scale with reasoning depth rather than context volume is exactly what Hopfield dynamics predict: the energy landscape becomes rougher and the need for iterative convergence greater precisely when relational structure is deep.
Intelligence as Free Energy Minimization
The third framework is the most general. Karl Friston’s Free Energy Principle, developed over the past two decades at University College London, proposes that any self-organizing system that maintains a stable relationship with its environment does so by minimizing what he calls variational free energy — a measure of the gap between the system’s internal model of the world and the sensory evidence it receives (Friston, 2010; Parr, Pezzulo & Friston, 2022).
Under this framework, perception is inference: the brain maintains a generative model of the world and continuously updates it to better predict incoming sensory signals. Action is equally inference: the organism acts to bring its sensory environment into alignment with its predictions. Learning is the slow update of the generative model’s parameters to reduce systematic prediction errors over time.
Sleep, under the Free Energy Principle, is offline free energy minimization. With sensory input suppressed, the system performs multiple passes of internal inference over replayed content, updating its generative model to reduce the accumulated prediction errors that waking life generated too rapidly to process in real time. During slow-wave sleep, model complexity decreases through synaptic reorganization. During REM, new information is integrated into existing schemas, increasing model accuracy while preserving structural coherence.
An important qualification is warranted here. The Free Energy Principle is a normative framework: mathematically, any system with stable self-organization can be described as minimizing free energy, regardless of its actual mechanism. The framework is therefore used here as an interpretative lens rather than a causal explanation. The empirical test of whether the lens is apt is specific: if sleep passes work because they implement variational inference, then gains should scale with the degree of mismatch between what the model encoded during consolidation and what deep reasoning requires — not with context volume. Lee et al.’s data confirm this prediction precisely.
What the Free Energy Principle adds to the engineering story is a normative account of why iterative offline processing is necessary: a single pass cannot minimize free energy sufficiently when the generative model needs substantial reorganization to represent deep relational structure. More passes are not redundant — each one reduces residual free energy. The process terminates not at an arbitrary number of iterations but at a principled criterion: convergence to a state in which prediction error over the consolidated context is minimized.
The Physics of Coherence
The fourth framework is the least conventional and, in some ways, the most fundamental. It originates not in computer science or neuroscience but in the physics of oscillatory systems, and has been developed by the author over the past decade under the heading of the Resonant Stack architecture (Konstapel, 2025).
The central claim of the Resonant Stack is that the von Neumann model of computation — sequential instruction execution on discrete binary hardware — is not a fundamental architecture but a historical accident, made dominant by the economics of semiconductor manufacturing rather than by any deep alignment with the physics of information processing. Physical systems that process information efficiently — brains, ecosystems, markets, climate systems — do not compute by executing instructions. They organize through coherence: coupled oscillators that synchronize into stable phase-locked configurations, with information encoded in the phase and frequency relationships between components rather than in discrete bit states.
The mathematical framework underlying this architecture draws on Maxwell’s original quaternion formulation of electrodynamics and on Peter Rowlands’ nilpotent algebra (Rowlands, 2007). A quaternion Q = φ + A encodes both a scalar potential φ (energy density, coherence measure) and a vector potential A (directed relational structure). The nilpotent constraint N² = 0 — a consequence of conservation laws in the Clifford algebra Cl(3,1) — ensures that state transitions respect zero-totality: no configuration that violates energy conservation is an admissible attractor. This eliminates an entire class of incoherent states by algebraic necessity rather than by empirical training.
Within this framework, the sleep mechanism appears as a discrete digital approximation of something the Resonant Stack does continuously and physically. Layer 4 of the Resonant Stack — Multi-Scale World Coupling — performs ongoing harmonic resonance between fast oscillatory modes (millisecond timescales, analogous to the active context window) and slow oscillatory modes (long-term persistent memory), converting transient context into stable attractor configurations through continuous phase-locking dynamics rather than N discrete passes. Sleep is what you need when your hardware cannot do this natively.
The Arnold tongue structure — the mathematical description of which frequency ratios produce stable synchronization in coupled oscillator systems (Arnol’d, 1983) — predicts which internal representations become stable attractors. Configurations corresponding to low-order rational frequency ratios (½, ⅔, ¾, etc.) occupy the widest attractor basins and are most robust to noise. This gives a physical, non-arbitrary basis for understanding which fast-weight configurations will support deep reasoning and which will not — a prediction that can, in principle, be tested by analyzing the frequency structure of trained fast weights on tasks of varying reasoning depth.
The Missing Piece: Structure in the Update Rule
Each of the four frameworks described above illuminates a different aspect of why sleep helps. None of them, however, addresses a specific technical gap in Lee et al.’s implementation that limits its theoretical completeness.
The fast-weight update rule in the sleep mechanism is learned end-to-end. The model discovers through training which update direction improves performance. This works, but it offers no structural guarantee that the updates converge reliably or that they preserve the coherence of the accumulated representation across passes. In principle, poorly learned update rules could introduce incoherent interference between passes, destabilizing rather than refining the fast-weight state.
The nilpotent constraint from Rowlands’ algebra provides the missing structural prior. Constraining each sleep-pass update operator to satisfy ΔS² = 0 ensures that each pass adds a conservation-respecting increment that cannot self-amplify. No pass can introduce incoherence that accumulates across subsequent passes. The update is always directed toward the attractor basin rather than exploring the surrounding landscape.
This is an open research problem, stated as such. A formal proof that nilpotent-constrained updates converge faster than unconstrained learned updates requires a Lyapunov stability analysis of the constrained operator on the Hopfield energy function — establishing that the energy decreases monotonically under nilpotent projection and that the rate exceeds that of unconstrained gradient descent. The prediction, derived from the physical theory, is that nilpotent-structured sleep achieves equivalent reasoning depth with two passes where unconstrained learning requires four. Empirical testing is straightforward: implement the nilpotent projection as a post-hoc layer in JAX, enforcing N² = 0 via the standard algebraic projection, and benchmark against Lee et al.’s cellular automaton and graph retrieval tasks.
If the prediction holds, the practical implication is significant: training cost, which scales linearly with the number of sleep passes, would be halved while maintaining performance on the deepest reasoning tasks.
The Integration Problem: Right Brain and Left Brain AI
The sleep finding has implications beyond the specific architecture Lee et al. propose. It speaks to a broader question about the division of cognitive labor in intelligent systems.
Current AI development is dominated by what might be called Left Brain AI: systems optimized for rapid, high-bandwidth pattern matching and generation, operating from statistical regularities in large training corpora. These systems are extraordinarily capable within their domain — but they are epistemically shallow. They excel at tasks where the answer is close to the surface of the training distribution, and they struggle with tasks that require deep sequential reasoning, long-horizon coherence, or the integration of information across widely separated contexts.
The Resonant Stack architecture proposes a complementary system — Right Brain AI — that monitors long-horizon coherence, detects phase transitions in complex systems, and provides the temporal context that pattern-matching systems lack. The interface between the two is formalized as the Corpus Callosum Protocol, which transmits coherence-state information from the Right Brain system to the Left Brain system in the form of a Resonance Encoding Vector (REV): a quaternionic data structure encoding the authority, urgency, contextual coherence, and ethical admissibility of the current system state.
The sleep mechanism on the Left Brain side creates the necessary receiving substrate for this integration. A transformer operating on a truncated key-value cache is in a poor position to receive coherence signals from the Right Brain system — its internal state is incoherent with respect to long-horizon context, and top-down coherence signals are overwhelmed by bottom-up statistical priors. A sleep-enabled transformer, by contrast, has consolidated its long-horizon context into fast weights that constitute stable Hopfield attractors. The coherence signal from the Right Brain system then acts as a gentle perturbation on a system already near a coherent configuration — requiring far less force to shift the output distribution toward the systemically appropriate response.
Under the Free Energy Principle, this integration can be formalized precisely. The REV acts as precision-weighting on the active inference process: it increases the weight of top-down coherence predictions relative to bottom-up statistical priors when the field coherence signal is high. This is functionally analogous to neuromodulatory control in the brain — acetylcholine regulating the balance between encoding and retrieval, dopamine modulating the learning rate in response to prediction errors. The analogy is interpretative rather than mechanistic, but it provides a principled account of how two systems operating at different timescales and with different computational architectures can be coordinated into a coherent whole.
Practical Consequences
For organizations building or deploying AI systems, the sleep finding has three immediate practical implications.
First, context length is not the binding constraint for complex reasoning tasks. The industry has invested heavily in extending context windows, on the assumption that more context equals better performance. Lee et al. show that this assumption fails precisely for the tasks that matter most — those requiring deep sequential reasoning. Investing in consolidation mechanisms (sleep) may yield larger returns per compute dollar than investing in context extension, particularly for tasks like multi-document analysis, longitudinal decision support, and complex planning.
Second, inference latency and reasoning depth are not in fundamental tension. The sleep mechanism moves expensive computation to a consolidation phase that runs before the query is answered. The query itself is answered in a single forward pass. This means that systems can be made arbitrarily better at deep reasoning without increasing the latency experienced by users — a practically important property for any production deployment where response time matters.
Third, the structural prior matters. The difference between a learned update rule and a nilpotent-constrained update rule may appear to be a technical detail, but it is actually a question of architectural reliability. Systems whose coherence properties are guaranteed by structure rather than learned from data are more predictable, more auditable, and more robust to distribution shift. As AI systems are deployed in higher-stakes domains — medical decision support, financial risk management, legal analysis — the difference between empirically effective and structurally guaranteed becomes commercially and legally significant.
Conclusion: What Sleep Reveals
There is a pattern in the history of AI research: the most important architectural insights tend to arrive not from within the field but from biology, physics, and mathematics, and they tend to arrive only when the engineering approaches have been pushed to their limits. The attention mechanism was inspired by selective visual attention in primates. Reinforcement learning drew on behavioral psychology and optimal control theory. The current generation of state-space models draws on the mathematics of linear dynamical systems.
The sleep finding continues this pattern. It arrives at the moment when the industry is discovering that simply scaling up transformers — more parameters, more context, more compute — does not solve the fundamental problem of deep reasoning over long horizons. And it arrives with a set of theoretical justifications that connect to the deepest ideas in memory science, statistical physics, and the mathematics of dynamical systems.
The four frameworks described in this essay — Complementary Learning Systems, Modern Hopfield Networks, the Free Energy Principle, and the Resonant Stack — each predicted, from different starting points and using different mathematical languages, that single-pass parallel computation is insufficient for deep reasoning and that iterative offline consolidation is architecturally necessary. The fact that a controlled experiment in machine learning confirms all four predictions simultaneously is not coincidence. It is convergent evidence that these frameworks are pointing at something real about the structure of intelligent systems.
What that something is can be stated simply: intelligence is not retrieval. It is consolidation. The difference between a system that has information and a system that can reason with it is not a matter of storage capacity or attention span. It is a matter of whether the system has had sufficient time — and sufficient computational depth — to transform raw experience into stable, coherent, reasoning-ready knowledge.
That transformation, in biological systems, happens during sleep. It is beginning to happen in artificial systems too. The question now is whether the field will treat this as a useful engineering trick or as the architectural signal it actually is — a pointer toward a fundamentally different way of building intelligent machines, one grounded in the physics of coherence rather than the statistics of pattern matching.
Annotated Reference List
For readers who wish to explore the underlying ideas in greater depth, the following references are organized by theme and accompanied by notes on their relevance and accessibility.
I. The Sleep Paper
Lee, S., McLeish, S., Goldstein, T., & Fanti, G. (2026). Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference. arXiv:2605.26099v2. The primary paper discussed throughout this essay. Clearly written and technically accessible to readers with a background in machine learning. The synthetic tasks (Rule 110, Depo) are particularly well-designed for isolating reasoning depth from context volume — a methodological contribution as important as the empirical results. The GSM-Infinite experiments ground the findings in a realistic setting.
II. Complementary Learning Systems
McClelland, J.L., McNaughton, B.L., & O’Reilly, R.C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419–457. The foundational paper for the two-system theory of memory. Remarkable for the clarity with which it derives architectural requirements from first principles: if a single system must learn both quickly and stably, it will fail at one or both. The solution — two systems with different learning rates, connected through replay — anticipates the key-value cache / fast-weight architecture by three decades. Available from the authors’ websites; essential reading.
Schapiro, A.C., McDevitt, E.A., Rogers, T.T., Mednick, S.C., & Norman, K.A. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. PNAS, 119(45), e2123432119. A 2022 computational model demonstrating how hippocampal-neocortical interaction during sleep can proceed autonomously, without external supervision, and how alternating NREM/REM stages enables both rapid integration and protection of existing knowledge. Directly relevant to the question of what a REM-equivalent stage might look like in AI systems.
Rasch, B., & Born, J. (2013). About sleep’s role in memory. Physiological Reviews, 93(2), 681–766. The most comprehensive review of the neuroscience of sleep and memory consolidation. Covers the full range of evidence from behavioral studies to single-unit recordings. Particularly valuable for its treatment of slow oscillations, spindles, and sharp-wave ripples as the neural mechanisms of hippocampal replay. Not light reading, but authoritative.
III. Modern Hopfield Networks
Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Gruber, L., Holzleitner, M., Pavlović, M., Sandve, G.K., Greiff, V., Kreil, D., Kopp, M., Klambauer, G., Brandstetter, J., & Hochreiter, S. (2021). Hopfield networks is all you need. ICLR 2021. arXiv:2008.02217. The paper that established the equivalence between transformer attention and modern Hopfield network update rules. Demanding mathematically, but the key result — that softmax attention is one step of Hopfield energy minimization — can be grasped from the abstract and introduction. The implication for understanding why multiple passes help is immediate once this equivalence is clear.
Martins, A.F.T., Treviso, M., Farinhas, A., Marinho, T., Aguiar, P.M.Q., Figueiredo, M.A.T., & Blondel, M. (2023). Sparse modern Hopfield networks. NeurIPS 2023 Workshop: Associative Memory & Hopfield Networks. Extends Ramsauer’s framework to a broader family of energy functions that produce sparse attention patterns and, importantly, exact convergence to single memory patterns in a small number of steps. Directly relevant to the nilpotent constraint proposal: both approaches aim to reduce metastable states, but through different mathematical routes.
Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. PNAS, 79(8), 2554–2558. The original Hopfield network paper. Four pages. One of the most elegant papers in the history of AI. Shows how a physical system — a network of neurons with symmetric connections — can store memories as energy minima and retrieve them through relaxation dynamics. The conceptual foundation for everything in Section III of this essay.
IV. Free Energy Principle and Active Inference
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11, 127–138. The most accessible overview of the Free Energy Principle for readers from outside theoretical neuroscience. Friston situates the principle in relation to other major brain theories and explains how it subsumes perception, action, and learning under a single mathematical framework. A good entry point before tackling the more technical papers.
Parr, T., Pezzulo, G., & Friston, K.J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press. The definitive book-length treatment of Active Inference. Part I is accessible to a general scientific audience; Parts II and III become technically demanding. Chapter 4 (on learning and memory) and Chapter 9 (on consciousness and sleep) are most directly relevant to the arguments in this essay.
Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B, 364, 1211–1221. Shows how the Free Energy Principle is implemented in the brain through predictive coding — a hierarchical scheme in which each layer of the cortex predicts the activity of the layer below and updates based on prediction errors. The equivalence between this scheme and the Hebbian-like update rules in SSM fast weights is the mathematical bridge between the FEP and the Lee et al. architecture.
V. Nilpotent Algebra and the Resonant Stack
Rowlands, P. (2007). Zero to Infinity: The Foundations of Physics. World Scientific. Rowlands’ comprehensive treatment of nilpotent quantum mechanics, arguing that the structure of physics — from the Dirac equation to the Standard Model — can be derived from the single algebraic requirement that the total state of any physical system is nilpotent (squares to zero). Not light reading, but the first three chapters lay out the core idea accessibly. The nilpotent constraint proposed in this essay is an application of Rowlands’ framework to the specific domain of fast-weight update rules.
Konstapel, J. (2025). The Architecture of Right Brain AI (RAI). constable.blog. The primary architectural document for the Resonant Stack. Describes the five-layer architecture (Oscillatory Substrate, Nilpotent Coherence Kernel, Virtual Resonant Being, Multi-Scale World Coupling, Anthropic Constraints), the Corpus Callosum Protocol, and the Resonance Encoding Vector. Available with downloadable technical specifications.
Konstapel, J. (2025). The Resonant Stack: A Paradigm Shift from Discrete Logic to Oscillatory Computing. constable.blog. The engineering companion to the RAI architecture document. Covers the historical evolution from mechanical to electronic to connectionist to resonant computing, the five-layer technical specification, the migration pathway, and a detailed survey of current R&D in photonic and spintronic oscillatory computing. Includes an appendix mapping every architectural layer to its current laboratory prototype.
Konstapel, J. (2026). The Oscillating Vacuum Model: A Unified Framework Derived from Maxwell’s Quaternion Electrodynamics and Rowlands’ Nilpotent Constraint. constable.blog. Applies the nilpotent algebraic framework to climate oscillations, deriving the observed harmonic ratios between major climate cycles (AMO, Gleissberg, Milanković) as Arnold-tongue frequency locking in a coupled quaternion oscillator network. The Arnold-tongue analysis in this paper provides the attractor topology referenced in Section 5 of the current essay.
VI. Oscillatory Computing: The Hardware Foundation
Pikovsky, A., Rosenblum, M., & Kurths, J. (2001). Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge University Press. The mathematical foundation for Kuramoto dynamics and phase-locking in coupled oscillator systems. Rigorous but well-organized; the first three chapters provide the conceptual framework; later chapters develop the full mathematical theory. The standard reference for anyone working on oscillatory computing substrates.
Arnol’d, V.I. (1983). Geometrical Methods in the Theory of Ordinary Differential Equations. Springer. The original mathematical treatment of Arnold tongues — the regions in parameter space where coupled oscillators lock to rational frequency ratios. Chapter 3 on perturbation theory and the circle map is the most directly relevant section. Mathematically demanding but repays careful study; the key insight (that rational frequency ratios produce the widest, most robust locking regions) is accessible from the introduction.
Strogatz, S.H. (2003). Sync: The Emerging Science of Spontaneous Order. Hyperion. An accessible and beautifully written account of synchronization phenomena across physics, biology, and social systems. Covers Kuramoto dynamics, fireflies, cardiac pacemakers, laser arrays, and the Millennium Bridge. The best entry point for readers encountering the physics of coupled oscillators for the first time.
VII. Serial Computation and the Limits of Parallelism
Liu, Y., Preechakul, K., Kuwaranancharoen, K., & Bai, Y. (2025). The serial scaling hypothesis. arXiv:2507.12549. Argues that many important reasoning tasks are inherently serial — they have no efficient parallel shortcut — and that models trained to solve them with parallel computation develop brittle shortcut solutions that fail on harder instances. The theoretical grounding for why the sleep mechanism produces its largest gains on deeply sequential tasks.
Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., & Kaiser, Ł. (2018). Universal transformers. arXiv:1807.03819. Introduces depth-recurrent transformers, showing that applying the same transformer block repeatedly (rather than stacking different blocks) produces Turing-complete computation and improves generalization on algorithmic tasks. The intellectual ancestor of the sleep mechanism, establishing that recurrence at inference time adds computational expressivity.
VIII. Contextual Background
McWhinney, W. (1992). Paths of Change: Strategic Choices for Organizations and Society. Sage Publications. The foundational work on the four-mode model of systemic change — Sensory, Social, Analytic, Mythic — formalized as a quaternionic cycle. The organizational and epistemological precursor to the KAYS framework in the Resonant Stack, and the source of the four-dimensional structure of the Resonance Encoding Vector.
Maxwell, J.C. (1873). A Treatise on Electricity and Magnetism. Clarendon Press. Maxwell’s original quaternion formulation of electrodynamics, containing the scalar potential term that Heaviside’s vector calculus subsequently discarded. Relevant as the physical foundation for the quaternion oscillator model; the preservation of the scalar potential term is essential for modeling systems with both energy storage (scalar) and directed energy transport (vector).
Holling, C.S. (2001). Understanding the complexity of economic, ecological, and social systems. Ecosystems, 4, 390–405. The panarchy model of nested adaptive cycles — fast, small-scale diversity embedded in slow, large-scale resilience. The ecological grounding for Layer 4 of the Resonant Stack (Multi-Scale World Coupling) and for the CLS theory’s requirement that memory systems operate at multiple timescales simultaneously.
