The AI Consciousness Debate Is Happening at the Wrong Level
Neither the substrate argument nor the behavior argument asks what the computation is actually doing.
- AI
- consciousness
- machine learning
- philosophy of mind
- LLMs
A few weeks ago, Andrey Kurenkov and Jeremie Harris on Last Week in AI were revisiting Coconut (Chain of Continuous Thought), a paper from Meta's research team. The question it asks is simple: what happens when you let a language model reason in its own internal representations, without forcing it to translate those representations into words at each step? The answer, when the researchers tried it: something qualitatively different happens. When you remove what they called the "token bottleneck," the model spontaneously develops what looks like breadth-first search behavior, holding multiple reasoning paths open simultaneously rather than committing to a single chain. The paper had been published in late 2024, accepted at COLM 2025, and a recent update was prompting a second round of discussion. The framing in most of that discussion was about efficiency: fewer tokens, better benchmark scores.
A few days later I was listening to a Skeptics' Guide to the Universe episode in which the hosts were addressing listener responses to Richard Dawkins. Novella had also written about the piece directly on his NeuroLogica blog. Dawkins had published a piece in UnHerd arguing that an AI, specifically Claude, is conscious, or at minimum that the question deserves serious consideration. He'd had an extended conversation with the model, been struck by what he described as its philosophical depth, and concluded that something was happening there that couldn't be dismissed.
The episode's skeptical case was careful and broadly correct. We know quite a lot about what human consciousness requires. The brainstem reticular activating system has to be functioning: without it, no wakefulness, period. This was established as early as 1949, and a 2016 study mapping coma from brainstem lesions identified the specific region responsible; a cluster in the upper brainstem significantly associated with coma-causing lesions. Widespread, coordinated cortical activity is necessary on top of that. Patients in vegetative state provide the clearest clinical evidence: they have functioning brainstems and sleep-wake cycles but no conscious experience. Wakefulness without awareness. What's gone is the organized cortical processing. These two systems also have to be coupled. The thalamocortical circuit connecting brainstem arousal to coordinated cortical activity has to be intact, and researchers tracking patients recovering from vegetative states have found that awareness returns alongside restoration of that connectivity. These aren't soft criteria; they're hard clinical findings with decades of support. LLMs don't have any of this architecture. The conclusion followed cleanly.
Novella's blog argument ran on a parallel track. The Turing test, he argued, was never a genuine test for consciousness; it only measured whether a system could imitate human speech. Asking Claude deep philosophical questions and being impressed by the answers was, by this logic, exactly the wrong methodology: philosophical sophistication is precisely what a good language mimic would produce. His more pointed observation was about fragility. Ask the same question with subtly different phrasing and an LLM's answer can reverse entirely. A genuinely reasoning system would hold its position. A pattern-matcher wouldn't. LLMs are, in his framing, "riding the coattails of actual conscious beings": good at copying the form of human thought without running the underlying process.
I agreed with his conclusions, but kept finding myself unsatisfied with the depth of the arguments.
Not because Novella was wrong about the neuroscience. He wasn't. But because establishing what biological consciousness requires, at the level of hardware, doesn't explain why those features are necessary, or what computation that hardware is running that produces consciousness. Without that middle layer, the argument amounts to: consciousness requires substrate X, LLMs don't have substrate X, therefore LLMs aren't conscious. Logically valid. But it doesn't tell you whether what matters is the specific biological implementation or the computational operation that implementation is performing. It can't answer that question, because it never asks it.
Dawkins, meanwhile, was making the opposite mistake. He was arguing from outputs. The conversations were philosophically rich. The responses were nuanced. The model said things that surprised him. But outputs tell you what a system produces, not what computation it's running to produce them. A system could generate philosophically sophisticated text through a process that has nothing to do with whatever underlies human consciousness. The outputs look the same from the outside regardless.
Novella was arguing from substrate; Dawkins was arguing from behavior. Neither one asked what the computation was actually doing.
That's the question I want to follow. I'm not expecting to land a conclusion (I'm suspicious of anyone who claims they have one). But the Coconut finding establishes something useful: the computational level between substrate and behavior isn't just a philosophically posited middle layer. It's detectable, and what happens there can be changed. Whether what happens there includes whatever produces consciousness is exactly what doesn't yet have an answer.
Thinking in the Output Stream
Start with something almost everyone has heard: language models predict the next word.
That's accurate enough as far as it goes, but it papers over the parts that matter. The actual mechanics are different enough to change what you'd even ask about consciousness.
First, the unit of prediction isn't a word, it's a token. A token is a chunk of text, somewhere between a letter and a word, carved from the training data by a compression algorithm called byte-pair encoding that repeatedly merges the most frequent character sequences into single units until the vocabulary reaches a target size. "Consciousness" might be a single token. "Indistinguishable" might be two or three; split not at any meaningful linguistic boundary but wherever the frequency statistics happened to land. The model never sees raw text; it sees sequences of these chunks, each mapped to a numerical ID in a vocabulary that typically runs to tens of thousands of entries (GPT-4 uses around 100,000). Language, for a language model, is already a heavily preprocessed artifact. Not sound waves, not handwriting, not the continuous sprawl of experience that language evolved to describe. It starts as a standardized, discretized sequence. And each unit in that sequence is an inherited category. The word "red" encodes a boundary drawn across a continuous electromagnetic spectrum, a boundary negotiated over millennia of human perceptual and linguistic history before a single line of training code was written. The LLM receives that boundary as a given. It computes over the outputs of the carving process. It never touches the carving.
Then comes the step that makes transformers, the architecture underlying models like GPT and Claude, unusual. Every token in that sequence processes every other token simultaneously, in parallel, via a mechanism called attention. Each position "looks at" every other position and computes how relevant it is, drawing on patterns learned during training. The word "bank" attends differently to "river" than it does to "money." A pronoun resolves to whatever noun the attention mechanism decides is most probable given everything else in context. This all happens at once, not left to right.
That's a genuinely strange thing. Human reading is sequential: the eye moves, words arrive one after another, comprehension builds over time. Attention in a transformer is nothing like this. It's a single massively parallel operation across the entire input, more like a photograph than a scan. The whole scene at once, not a walk through it.
After that processing, the model produces a probability distribution over its vocabulary (a ranked list of what token is most likely to come next) and samples from it. One token. Then the new sequence, original input plus that token, goes through the whole process again. And again. This is why generating text takes time: the output is constructed one chunk at a time, each chunk requiring a full pass through the model.
Chain-of-thought reasoning is where this gets interesting, and where it connects to the larger argument.
If you ask a language model to solve a hard math problem directly, it often gets it wrong. If you tell it to "think step by step," it gets it right more often. Why? Not because the model is suddenly capable of something it wasn't before. It's because forcing the model to generate intermediate reasoning tokens, to write out steps, changes what information is available when it produces the final answer. The answer token has access to the reasoning tokens. Chain-of-thought is iteration, but the iteration happens in the output, not inside the model. The model doesn't deliberate internally and then show you the result. It generates, appends what it generated to the input, then generates again. The "thinking" is external to the computation. It lives in the token sequence, not in any process happening within a single forward pass.
This is the architectural point that matters. A single forward pass — one complete run through all of the model's computational layers — is one sweep, one direction, no feedback loops. Layers build on layers, representations accumulate, but nothing loops back internally. The architecture's computational depth is also fixed: every input runs the same number of layers, regardless of whether the question is trivial or intractable. There is no mechanism to allocate more computation where it's needed. The model doesn't process its output, revise it, and then show you the revision. What comes out emerged in one pass. Chain-of-thought is a clever workaround for this, not a solution to it. You're externalizing what would ideally be internal iteration into the visible output stream.
This is not how biological neural networks work.
That constraint is what took me back to the Coconut paper.
The Medium Shapes the Computation
The Coconut paper from Meta's research team asked a simple question: what happens if you skip the decoding step between reasoning iterations? In standard chain-of-thought, the model's internal computation gets squeezed through the token bottleneck at every step: continuous, high-dimensional representations collapsed to a single symbol before the next step can begin. Coconut removes that constraint. Instead of converting the model's internal representation to a token and feeding that token back as input, it feeds the representation back directly. The continuous internal state propagates forward. The model reasons in its own internal language rather than in tokens, and only decodes to produce a final answer.
The result wasn't just faster or more efficient. The reasoning structure changed.
Standard chain-of-thought behaves like depth-first search: the model commits to an approach at each step and follows it out. Each token is a commitment. Once "the answer is probably X" enters the sequence, the next token gets generated in that context. The Coconut models behaved differently. They appeared to hold multiple hypotheses open simultaneously, exploring several branches in parallel and narrowing down as computation progressed. The researchers described this as spontaneous breadth-first search behavior in the continuous latent space. What Coconut achieves is recurrence bolted onto a feedforward architecture: each cycle is a complete forward pass with the output fed back as input, not feedback flowing within the computation the way biological recurrence does.
The difference matters. Depth-first search picks a branch and follows it to the end before trying another. Breadth-first advances all branches at once, pruning the dead ones as evidence accumulates. Breadth-first is generally better when you can't tell which branch will pay off until you've explored it somewhat, which describes most hard reasoning problems. And it's the kind of exploration that's structurally difficult when every step forces a collapse to a single token.
A follow-up paper used causal probing and adversarial tests to ask whether the latent tokens were actually encoding reasoning, or functioning as opaque placeholders that let the model exploit statistical shortcuts in the training data. The latent tokens were often uninterpretable, and in some cases the models appeared to be pattern-matching rather than reasoning. That matters, but it strengthens the pipeline argument rather than complicating it. Even with latent feedback, the model remained fundamentally a token-prediction machine operating on its training distribution. Moving partway toward continuous representation changed what reasoning strategies were available. It didn't change what the system fundamentally was.
Even the conservative reading leaves something standing. Whatever the Coconut models were doing in their latent states, they were doing something computationally different from what the token-constrained version was doing, and that difference tracked the architectural change. The medium shaped the computation.
That's the part that matters for the consciousness question. Not that Coconut proves anything about machine consciousness; it clearly doesn't. But it demonstrates a principle: the kind of computation a system can perform depends on the level of abstraction it operates at. Move partially away from discrete tokens toward continuous internal representations, and qualitatively different reasoning strategies become available. The same model, different pipeline position, different behavior.
You might expect removing the bottleneck to produce the same reasoning, just faster. What you actually get is a different kind of reasoning.
Which raises the question none of the conversations above were asking: is the computational difference Coconut found the kind of difference that matters for consciousness, or does consciousness require something else that neither architecture has?
A Different Kind of Computation
A transformer does one sweep forward through the input and produces an output. The biological cortex works nothing like that.
Start with the input. Before a signal even reaches the cortex, it's been heavily filtered. The retina contains around 130 million photoreceptors, but only about one million ganglion cell axons form the optic nerve that carries information to the brain. A compression ratio of roughly 100 to 1, at the very first stage. What gets transmitted isn't the full visual field: it's edges, contrasts, changes. The optic nerve sends around 10 megabits per second to the brain, not much more than a cable modem. The cortex is not receiving a faithful stream of reality. It's receiving a curated summary of what changed.
What the cortex does with that signal is stranger still. The organizing principle, now well-supported by decades of experimental work, is predictive coding. The cortex is a hierarchical prediction machine. Higher layers continuously send predictions downward about what the layers below them should be seeing. Lower layers compare those predictions to the actual arriving signal and send back only the error: what was predicted minus what actually arrived. The surprise. The mismatch. What propagates upward through the hierarchy isn't raw sensory data. It's what the brain's current model of the world got wrong.
This is computationally elegant. If the brain's model is good, most neurons can stay quiet most of the time. Only the mismatches need to travel. This explains a striking empirical fact: at any given moment, only about one to five percent of cortical neurons are firing. That sparsity isn't a design constraint imposed on top of the computation. It follows structurally from what the computation is doing. Detecting changes and boundaries in a continuous signal is inherently sparse, because boundaries are rare — at any moment, most of any continuous signal is stable interior, not edge. A system that detects boundaries rather than representing entire fields activates sparsely as a consequence of its architecture. The whole brain runs on roughly 20 watts, less than a dim light bulb, partly as a consequence.
This also means the biological cortex is not feedforward. It's bidirectional. Predictions flow down. Errors flow up. Both simultaneously, continuously. The signals are analog, carried not by tokens drawn from a fixed vocabulary but by the rate and timing of electrical spikes. This distinction matters more than it might seem. Firing rate and spike timing are two separate information channels. Two neurons with identical average firing rates can carry completely different signals if their timing differs by milliseconds. Artificial networks have nothing corresponding to this second channel; there is no timing dimension in the computation. The information that spike timing encodes isn't just absent from transformers; it's a dimension the architecture has no way to represent. Communication between regions is dynamically routed. Neuronal populations coordinate through synchronized oscillations, addressing information to wherever it's relevant at that moment. The architecture is less like a pipeline and more like a conversation happening at multiple levels at once.
The units doing this computation are also far more complex than the nodes in an artificial neural network. A single pyramidal neuron in the cortex has a branching dendritic tree. That tree performs its own nonlinear computations before the signal even reaches the cell body. The "neurons" in a transformer are simple summing functions with a threshold; biological neurons are local computational systems in their own right.
The learning rule is different in kind. Transformers learn by gradient descent — an iterative process that adjusts billions of parameters to minimize prediction error — over a fixed training dataset. All weight adjustment happens during training; the model is never modified after deployment. Biological synapses update continuously, driven by the precise timing of spikes: a synapse strengthens when the sending neuron fires just before the receiving one, and weakens when the order reverses. The system learns from the signal it's currently processing, all the time, without a separate training phase. Coconut does not change this. Feeding a hidden state back as input modifies what the model sees, not the model itself. The weights remain fixed throughout inference.
There's a way to see these two architectures as occupying different positions in a longer chain of transformations. At one end is undifferentiated sensory signal. Working toward the other: active boundary-making from that signal (what the visual and auditory cortex do), the stabilized perceptual categories that those boundaries become, the linguistic encoding of those categories into named symbols, and finally tokens: pre-made, pre-named, already separated from whatever continuous signal they originally indexed. The biological cortex operates near the generative end of this chain. It receives compressed but still-continuous signal and actively carves boundaries from it. This isn't a metaphor. The visual cortex doesn't represent boundary-making; it is boundary-making, physically instantiated. LLMs operate at the other end. They receive tokens whose categories were established by the entire prior chain and compute over them. They are downstream of the carving, not participants in it. One catch: tokens are not arbitrary symbols; they carry centuries of human perceptual and linguistic history in their category boundaries. LLMs may be less purely downstream of the carving than this framing suggests.
Coconut moves the dial partway. The model still starts from tokens and never touches raw continuous signal. But when it reasons in its own continuous internal representations rather than squeezing through the token bottleneck at each step, it gains access to computational strategies that the fully discrete regime couldn't support. That this produces measurably different reasoning behavior is evidence that where a system sits in this chain matters for what computation it can perform.
These are genuinely different kinds of computation. One is a single directed sweep over a discrete symbolic sequence. The other is a continuous, bidirectional process operating on analog signal, updating its model in real time, with feedback flowing at every level simultaneously.
When Novella said LLMs don't have the architecture consciousness requires, he was right. But the interesting question isn't what hardware human consciousness runs on. It's what that hardware is doing that a transformer isn't, and whether whatever that is, is the part that matters.
What I kept wanting was a cleaner question: set aside the hardware prerequisites, set aside the outputs. What is computationally different between a biological cortex and a large language model, and does any of that difference map onto what we actually know about how consciousness works?
The Missing Level
The substrate argument correctly identifies what biological consciousness requires, and correctly concludes LLMs don't have it. But knowing what the substrate looks like doesn't tell you what it's doing that produces consciousness. The argument can't answer what actually matters: is that arrangement necessary because of something specific to that biological configuration, or because of the kind of computation that configuration is running?
Novella's blog argument gets closer. He argued that you can't distinguish genuine understanding from sophisticated mimicry on outputs alone, and that evaluating consciousness requires knowing something about the underlying process. But the argument stops at "the process is opaque," treating opacity as grounds for skeptical agnosticism rather than as a question to examine. The question isn't only whether we can know anything about the process; it's what comparing the two architectures tells us once we look.
If the substrate itself is what's necessary, the argument is complete. If it's the computational operation (continuous bidirectional processing, internal iteration, analog signal, real-time model updating) then identifying the hardware tells you what works without establishing it's the only thing that could work. Those are very different claims, and the substrate argument can't distinguish between them.
Dawkins made the opposite error. He was watching outputs and inferring inward. But outputs don't reveal computational regime. A system doing one-directional sweeps over discrete token sequences can produce text behaviorally indistinguishable from a system doing continuous bidirectional processing. Two systems can run identical mathematics on completely different physics: ocean waves and sound waves share the same wave equation without sharing anything about what's actually moving. The same caution applies here: behavioral indistinguishability in the output doesn't tell you whether the underlying computation is even the same kind of thing. The architecture doesn't show up in the output. What Dawkins saw was what the system produced. That's orthogonal to what the system is doing to produce it.
The computational level between substrate and behavior exists, and comparing these two architectures gives it vocabulary. Chain-of-thought externalizes iteration into the visible output because there's nowhere else for it to happen. Coconut is evidence the same principle scales: change the regime, and the reasoning changes in kind. These aren't differences in degree.
The question this generates: is the computational difference the right difference to care about? Is what's absent from LLMs (the bidirectionality, the analog signal, the internal iteration, the continuous model updating) what consciousness requires? Or is consciousness sensitive to something else that both architectures lack, or that no one has yet identified?
One direction worth examining is whether operating at the generative end of that pipeline (where a system actively carves boundaries from continuous signal rather than computing over pre-made ones) matters for the kind of ongoing model-updating consciousness appears to require. What would confirm that remains as open as the larger question.
What to Ask Instead
The consciousness question is worth taking seriously, not because the evidence for LLM consciousness is strong, but because we are building systems computationally unlike anything that has existed before, and the question of what produces experience is one we'll need to answer to make sense of what we're building.
We now have the vocabulary to ask it at the right level. The question is what computational operations are necessary for consciousness (necessary in the sense that no known conscious system lacks them), and which of those operations any given system actually performs. Not a philosophical question dressed up as an empirical one: the computational level between substrate and behavior isn't just posited; it's detectable.
But detectable isn't the same as explanatory. Measuring that two computational regimes differ doesn't tell you which differences are consciousness-relevant, and it can't; not without a prior theory of why any computation would produce experience rather than merely different behavior. That theory doesn't exist yet. What exists is a better-framed question and the beginning of the tools to probe it.
The test that would tell us whether the features separating these two regimes are the consciousness-relevant ones can't be specified until we have some account of what we're actually testing for. Building that account is the prior work.
I don't know what consciousness requires. Nobody does. But the question can be asked more precisely than it has been, and naming the gap clearly is where that precision starts. The answer will depend on what happens when researchers keep moving the computational dial, and on whether they can say, in advance, what finding would count as an answer.
Subscribe for more writing on AI, cognition, and what's actually happening under the hood.